원본 파일의 열 헤더를 기반으로 파일을 별도의 파일로 분할하는 방법은 무엇입니까?

Question 1

$ awk '
    NR == 1 {
        for (i=1; i<=NF; i++) {
            output[i] = "output" $i ".txt"
            files[output[i]] = 1
        }
        next
    }
    {
        for (i=1; i<=NF; i++)  printf "%s", $i > output[i]
        for (file in files)    print ""        > file
    }
' input.filename

$ for f in output*.txt; do echo $f; cat $f; done
output1.txt
02202020
02101011
02101011
output2.txt
2022002
1022002
1022002
output3.txt
220111
220000
220000
output30.txt
00202
00202
00202
output4.txt
2020002
2020012
2020012

머리글 행에는 32개의 필드가 있고 다른 행에는 33개의 필드가 있습니다. 이 문제를 먼저 해결해야 합니다.

Answer

$ awk '
    NR == 1 {
        for (i=1; i<=NF; i++) {
            output[i] = "output" $i ".txt"
            files[output[i]] = 1
        }
        next
    }
    {
        for (i=1; i<=NF; i++)  printf "%s", $i > output[i]
        for (file in files)    print ""        > file
    }
' input.filename

$ for f in output*.txt; do echo $f; cat $f; done
output1.txt
02202020
02101011
02101011
output2.txt
2022002
1022002
1022002
output3.txt
220111
220000
220000
output30.txt
00202
00202
00202
output4.txt
2020002
2020012
2020012

머리글 행에는 32개의 필드가 있고 다른 행에는 33개의 필드가 있습니다. 이 문제를 먼저 해결해야 합니다.

Question 2

펄 스크립트.

$in대신 파일 이름을 설정 genome.txt하거나 이름을 인수로 전달하십시오.

스크립트 이름을 지정 counter.pl하고 실행 권한을 부여한 다음 실행하세요../counter.pl

chmod 755 counter.pl
./counter.pl

또는

chmod 755 counter.pl
./counter.pl genome.txt

counter.pl:

#!/usr/bin/perl

use strict;
use warnings;

my $in = $ARGV[0] || 'genome.txt'; # input file name

open (my $F, '<', $in) or die "Cannot open input file $!";
my $n = 0;
my %fd = ();
my @fd = ();

while (<$F>) {
        # trim
        s/^\s+//;
        s/\s+$//;
        next if (!$_); # Skip empty lines
        my @x = split(/\s+/, $_);
        # 1st line, open files
        if ( ! $n++)  {
           my $fd = 0;
           for (@x) {
              open ($fd{$_}, '>', "output$_.txt") 
                or die ("Cannot open file $!")
                  if (!exists($fd{$_}));
              $fd[$fd++] = $_;
           }
        }
        else { # Write data
           die ("Should have " . ($#fd+1) . " entries on line $n")
             if ($#x != $#fd);
           for (0 .. $#x) {
              print {$fd{$fd[$_]}} ($x[$_]);
           }
           print {$fd{$_}} ("\n") for (keys %fd);
        }
}

close $fd{$_} for (keys %fd);
close $F;
# the end

줄당 고정 단어 수(때때로 32개, 예를 들어 33개).

이 버전은 모든 열 변형을 수용할 수 있지만 모든 행의 단어 수가 동일해야 합니다. die단어 수가 다르거나 파일을 열 수 없는 경우 오류(줄)가 나타납니다.

파일 이름( $in)을 조정하면 됩니다.

입력 파일: (끝에 추가 0을 제거)

1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 30 30 30 30
0 2 2 0 2 0 2 0 2 0 2 2 0 0 2 2 2 0 1 1 1 2 0 2 0 0 0 2 0 2 0 2
0 2 1 0 1 0 1 1 1 0 2 2 0 0 2 2 2 0 0 0 0 2 0 2 0 0 1 2 0 2 0 2
0 2 1 0 1 0 1 1 1 0 2 2 0 0 2 2 2 0 0 0 0 2 0 2 0 0 1 2 0 2 0 2

출력1.txt

02202020
02101011
02101011

출력2.txt

2022002
1022002
1022002

출력 30.txt

0202
0202
0202

출력3.txt

220111
220000
220000

출력4.txt

2020002
2020012
2020012

Answer