블록에 누락된 데이터 삽입

Question

더 쉬운 방법으로 할 수 있을 것 같지만, 한 시간 동안 고민한 끝에 제가 생각해 낼 수 있는 최선의 방법은 다음과 같습니다.이 Python 스크립트:

#! /usr/bin/env python3

import sys, os

class Block:
    block_id = ''
    source1 = ''
    source2 = ''
    mixtures = []
    def __init__(self, block_id = '', source1 = '', source2 = '', mixtures = []):
        self.block_id = block_id
        self.mixtures = mixtures
        self.source1 = source1
        self.source2 = source2

        # Convert mixtures to a set of characters. For example, 
        #     ''.join(["AT", "TT"]) 
        # creates the string "ATTT". set() then converts that string to 
        # a set of characters {'A', 'T'}
        sources = set(''.join(mixtures))

        # If a source is empty, we take from the set the first element (pop()) 
        # after removing the other source (difference()). Since the set 
        # contains single characters, we double it to get "AA", "TT", etc.
        if self.source1 == '':
            self.source1 = sources.difference(set(self.source2)).pop()*2
        sources.remove (self.source1[0])
        if self.source2 == '':
            self.source2 = sources.pop()*2

    def print (self):
        print (self.block_id, "source1", self.source1)
        print (self.block_id, "source2", self.source2)
        for mix in self.mixtures:
            print (self.block_id, "mixture", mix)

if len(sys.argv) == 1:
    files = [os.stdin]
else:
    files = (open(f) for f in sys.argv[1:])

for f in files:
    # Read in all the lines
    data = [line for rawline in f for line in [rawline.strip().split(' ')]]
    # Get the unique block IDs
    blocks = set (lines[0] for line in data)
    # For each block ID
    for b in blocks:
        # The corresponding mixtures
        mix = [line[2] for line in data if line[0] == b and "mixture" == line[1]]

        # If "source1 XX" is present, we will get the list ['XX'], and [] if 
        # source1 is not present. ''.join() allows us to flatten ['XX']  to 
        # just 'XX' (and doesn't affect []). Similarly for source2.
        source1 = ''.join(d[2] for line in data if line[0] == b and "source1" == line[1])
        source2 = ''.join(d[2] for line in data if line[0] == b and "source2" == line[1])

        # Create an object of the class defined above, and print it.
        # Initialization fills up the blank values.
        Block(b, source1, source2, mix).print()

그렇더라도 이는 순서가 지정되지 않은 무작위 출력을 제공합니다(즉, block3데이터가 앞에 올 수 있음 block1등).

이를 스크립트(예: insert.py)에 저장하고 다음을 실행합니다.

python3 insert.py inputfile

나는 이것을 다시 썼다앗:

#! /usr/bin/awk -f

function build (block, source1, source2, sources, mixtures)
{       
    if (! source1)
    {
        for (char in sources)
        {
            if (source2 != char char)
            {
                source1 = char char
                delete sources[char]
                break
            }
        }
    }
    if (! source2)
    {   
        for (char in sources)
        {
            if (source1 != char char)
            {
                source2 = char char
                delete sources[char]
                break
            }
        }
    }
    printf "%s %s %s\n", block, "source1", source1
    printf "%s %s %s\n", block, "source2", source2
    for (m in mixtures)
    {
        for (i = 0; i < mixtures[m]; i++)
        {
            printf "%s %s %s\n", block, "mixture", m
        }
    }

}

{
    if (prev != $1)
    {
        if (prev in data)
        {
            build(prev, source1, source2, sources, mixtures)
        }

        prev = $1
        source1 = ""
        source2 = ""
        delete sources
        delete mixtures
    }

    data[$1]++
    if ($2 == "source1") {source1 = $3; next}
    if ($2 == "source2") {source2 = $3; next}
    if ($2 == "mixture")
    {
        mixtures[$3]++ 
        split ($3, chars, "")
        for (i=1; i <= length($3); i++)
        {
            sources[chars[i]]++
        }
    }
}

END { build(prev, source1, source2, sources, mixtures) }

스크립트(예 insert.awk: ) 에 저장 chmod +x하고 실행합니다.

./insert.awk inputfile

이제 명령도 유지해야 합니다. 를 사용했는데 delete일부 awk에는 없을 수도 있습니다(그러나 GNU awk 및 mawk에는 있어야 함).

Answer 1