유닉스에서 대용량 파일 정렬

유닉스에서 대용량 파일 정렬

이런 파일이 있어요

chr1    101466461   101466462   1
chr6    160143888   160143889   1
chr19   19231081    19231082    1
chr18   47008735    47008736    1
chr1    161407811   161407812   1
chr4    2957295 2957296 2
chr12   49449119    49449120    1
chr7    99698936    99698937    45
chr1    17555949    17555950    1
chr17   47995738    47995739    1
chr12   77244463    77244464    1
chr20   45846426    45846427    1
chr8    103667756   103667757   1
chrX    3733206 3733207 3
chr19   17711889    17711890    3
chr1    202930379   202930380   1
chr17   62654249    62654250    2
chr11   118560806   118560807   1
chr19   12808650    12808651    2
chr2    61738736    61738737    1
chr12   8121782 8121783 1
chr22   30769375    30769376    1
chr15   50866284    50866285    1
chr8    128412986   128412987   2
chr17   63118510    63118511    2
chr17   27169311    27169312    1
chr2    25125404    25125405    1
chr13   26752444    26752445    2
chr17   79828654    79828655    1
chr16   89713556    89713557    3
chr6    43478242    43478243    1
chr17   79507720    79507721    37
chr2    10251911    10251912    1
chr2    99942853    99942854    1
chr6    30766751    30766752    1
chr2    241401259   241401260   1
chr7    150020595   150020596   2

첫 번째 열, 두 번째 열, 세 번째 열을 정렬하고 싶습니다.

유닉스에서 sort 명령을 사용하여 이를 수행하는 방법은 무엇입니까?

출력은 다음과 같아야합니다

chr1    17555949    17555950    1
chr1    19585241    19585242    1
chr1    45140053    45140054    5
chr1    68656642    68656643    1
chr1    78408443    78408444    1
chr1    101466461   101466462   1
chr1    118882121   118882122   1
chr1    161407811   161407812   1
chr1    173830786   173830787   1
chr1    202930379   202930380   1
chr10   104223811   104223812   2
chr11   3377667 3377668 9
chr11   6337670 6337671 2
chr11   27524981    27524982    3
chr11   59373319    59373320    1
chr11   62322454    62322455    4
chr11   71149264    71149265    1
chr11   118560806   118560807   1
chr12   6776535 6776536 1
chr12   7083123 7083124 2
chr12   8121782 8121783 1
chr12   49449119    49449120    1
chr12   77244463    77244464    1
chr12   120729137   120729138   2
chr13   26752444    26752445    2
chr14   20826276    20826277    1
chr14   69838752    69838753    1
chr15   50866284    50866285    1
chr15   51841232    51841233    1
chr16   4431159 4431160 1
chr16   4871325 4871326 1
chr16   89713556    89713557    3
chr17   27169311    27169312    1
chr17   36924117    36924118    1
chr17   47995738    47995739    1
chr17   62654249    62654250    2
chr17   63118510    63118511    2
chr17   73873221    73873222    2
chr17   74083467    74083468    1
chr17   78105045    78105046    3
chr17   79507720    79507721    37
chr17   79828654    79828655    1
chr18   43534443    43534444    1
chr18   47008735    47008736    1
chr18   47794477    47794478    1
chr19   1606323 1606324 1
chr19   5691538 5691539 1
chr19   9929712 9929713 7
chr19   10421898    10421899    2
chr19   12808650    12808651    2
chr19   17711889    17711890    3
chr19   19231081    19231082    1
chr19   40853760    40853761    5
chr19   43994590    43994591    1
chr19   45924611    45924612    1
chr19   47977071    47977072    1
chr2    10251911    10251912    1
chr2    25125404    25125405    1
chr2    37362203    37362204    1
chr2    55277977    55277978    1
chr2    61738736    61738737    1
chr2    99942853    99942854    1
chr2    241401259   241401260   1
chr20   34230317    34230318    1
chr20   37053658    37053659    1
chr20   45846426    45846427    1
chr21   23526366    23526367    1
chr22   16145568    16145569    1
chr22   29119862    29119863    1
chr22   30712166    30712167    10
chr22   30769375    30769376    1
chr3    43711238    43711239    2
chr3    48633350    48633351    3
chr3    119389625   119389626   2
chr3    169796731   169796732   1
chr4    2957295 2957296 2
chr4    8440974 8440975 1
chr4    71554018    71554019    2
chr5    43477728    43477729    2
chr6    17988241    17988242    1
chr6    30653492    30653493    1
chr6    30766751    30766752    1
chr6    43478242    43478243    1
chr6    160143888   160143889   1
chr7    83789825    83789826    1
chr7    99058375    99058376    2
chr7    99698936    99698937    45
chr7    129222997   129222998   2
chr7    150020595   150020596   2
chr8    103667756   103667757   1
chr8    128412986   128412987   2
chr8    144601536   144601537   1
chr9    123631184   123631185   1
chr9    131411417   131411418   1
chr9    135216510   135216511   1
chr9    136278984   136278985   2
chrX    3733206 3733207 3
chrX    115589585   115589586   1
chrX    122866375   122866376   2
chrX    153279158   153279159   2

감사해요

답변1

GNU를 사용하여 구현됨sort

sort -V foo

man sortGNU 시스템 에서 :

   -V, --version-sort
          natural sort of (version) numbers within text

cat foo

chr1    101466461   101466462   1
chr6    160143888   160143889   1
chr19   19231081    19231082    1
chr18   47008735    47008736    1
chr1    161407811   161407812   1
chr4    2957295 2957296 2
chr12   49449119    49449120    1
chr7    99698936    99698937    45
chr1    17555949    17555950    1
chr17   47995738    47995739    1
chr12   77244463    77244464    1
chr20   45846426    45846427    1
chr8    103667756   103667757   1
chrX    3733206 3733207 3
chr19   17711889    17711890    3
chr1    202930379   202930380   1
chr17   62654249    62654250    2
chr11   118560806   118560807   1
chr19   12808650    12808651    2
chr2    61738736    61738737    1
chr12   8121782 8121783 1
chr22   30769375    30769376    1
chr15   50866284    50866285    1
chr8    128412986   128412987   2
chr17   63118510    63118511    2
chr17   27169311    27169312    1
chr2    25125404    25125405    1
chr13   26752444    26752445    2
chr17   79828654    79828655    1
chr16   89713556    89713557    3
chr6    43478242    43478243    1
chr17   79507720    79507721    37
chr2    10251911    10251912    1
chr2    99942853    99942854    1
chr6    30766751    30766752    1
chr2    241401259   241401260   1
chr7    150020595   150020596   2

sort -V foo

chr1    17555949    17555950    1
chr1    101466461   101466462   1
chr1    161407811   161407812   1
chr1    202930379   202930380   1
chr2    10251911    10251912    1
chr2    25125404    25125405    1
chr2    61738736    61738737    1
chr2    99942853    99942854    1
chr2    241401259   241401260   1
chr4    2957295 2957296 2
chr6    30766751    30766752    1
chr6    43478242    43478243    1
chr6    160143888   160143889   1
chr7    99698936    99698937    45
chr7    150020595   150020596   2
chr8    103667756   103667757   1
chr8    128412986   128412987   2
chr11   118560806   118560807   1
chr12   8121782 8121783 1
chr12   49449119    49449120    1
chr12   77244463    77244464    1
chr13   26752444    26752445    2
chr15   50866284    50866285    1
chr16   89713556    89713557    3
chr17   27169311    27169312    1
chr17   47995738    47995739    1
chr17   62654249    62654250    2
chr17   63118510    63118511    2
chr17   79507720    79507721    37
chr17   79828654    79828655    1
chr18   47008735    47008736    1
chr19   12808650    12808651    2
chr19   17711889    17711890    3
chr19   19231081    19231082    1
chr20   45846426    45846427    1
chr22   30769375    30769376    1
chrX    3733206 3733207 3

답변2

다음 명령을 실행하여:

LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -k4,4n file

원하는 정확한 결과를 얻습니다.

chr1    17555949    17555950    1
chr1    101466461   101466462   1
chr1    161407811   161407812   1
chr1    202930379   202930380   1
chr11   118560806   118560807   1
chr12   8121782 8121783 1
chr12   49449119    49449120    1
chr12   77244463    77244464    1
chr13   26752444    26752445    2
chr15   50866284    50866285    1
chr16   89713556    89713557    3
chr17   27169311    27169312    1
chr17   47995738    47995739    1
chr17   62654249    62654250    2
chr17   63118510    63118511    2
chr17   79507720    79507721    37
chr17   79828654    79828655    1
chr18   47008735    47008736    1
chr19   12808650    12808651    2
chr19   17711889    17711890    3
chr19   19231081    19231082    1
chr2    10251911    10251912    1
chr2    25125404    25125405    1
chr2    61738736    61738737    1
chr2    99942853    99942854    1
chr2    241401259   241401260   1
chr20   45846426    45846427    1
chr22   30769375    30769376    1
chr4    2957295 2957296 2
chr6    30766751    30766752    1
chr6    43478242    43478243    1
chr6    160143888   160143889   1
chr7    99698936    99698937    45
chr7    150020595   150020596   2
chr8    103667756   103667757   1
chr8    128412986   128412987   2
chrX    3733206 3733207 3

관련 정보