설명하다

Question 1

sed -e '
   s/::/\n/;s//\n/
   s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
   ;#  |--1---|      |-2-|
' ID.data

ID 문자열 주위에 마커를 배치하고 첫 번째 _ 앞 부분을 잡고 전체 줄을 이 값으로 바꿉니다. 산출:

TRINITY_DN120587_c0_g1_i1[ID1]

설명하다

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298
              |-|                         |-----------------------|

::의 첫 번째 발생과 두 번째 발생 사이에 있는 ID를 추출하고 싶다고 말합니다.

1단계: 관심 영역 주위에 마커(일반적으로 \n)를 배치합니다.

       s/::/\n/;s//\n/

   This is how the pattern space looks after the above tranformation

              ID1_TRINITY_DN120587_c0_g1\nTRINITY_DN120587_c0_g1_i1\ng.8298::m.8298

2단계: 두 \ns 사이의 ID와 _의 첫 번째 항목 왼쪽에 있는 문자열을 추출합니다.

                    s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
                    ;#  |------|      |---|
                    ;#     \1           \2

   [^_]       => matches any char but an underscore

   [^_]*      => matches 0 or more non underscore char(s)

   \([^_]*\)  => store what was matched into a memory, recallable as \1

   ^\([^_]*\) => anchor your matching from the start of the string

   .*\n       => go upto to the rightmost \n you can see in the string

   \n\(.*\)\n => Ooops!! we see another \n, hence we need to backtrack to
                 the previous \n position and from there start moving right again
                 and stop at the rightmost \n. Whatever is between these positions
                 is the string ID and is recallable as \2. Since the \ns fall outside
                 the \(...\), hence they wouldn't be stored in \2.

   .*         => This is a catchall that we stroll to the end of the string after
                 starting from the rightmost \n position and do nothing with it.

 So our regex engine has matched against the input string it was given in
 the pattern space and was able to store in two memory locations the data
 it was able to gather, viz.: \1 => stores the string portion which is in
 between the beginning of the pattern space and the 1st occurrence of the
 underscore.

 \2 => store the string portion which is in between the 1st and 2nd
       occurrences of :: in the pattern space.

                      \1 = ID1
                      \2 = TRINITY_DN120587_c0_g1_i1

 Now comes the replacement part. Remember that the regex engine was able to scan
 the whole of pattern space from beginning till end, hence the replacement
 will effect the whole of the pattern space.

 \2[\1] => We replace the matched portion of the pattern space (in our case it
           happens to be the entire string) with what has been stored in
           the memory \2 literal [ memory \1 literal ]
           leading to what we see below:

                  TRINITY_DN120587_c0_g1_i1[ID1]

In other words, you have just managed to turn the pattern space from:

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298

into the following:

                  TRINITY_DN120587_c0_g1_i1[ID1]

Answer

sed -e '
   s/::/\n/;s//\n/
   s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
   ;#  |--1---|      |-2-|
' ID.data

ID 문자열 주위에 마커를 배치하고 첫 번째 _ 앞 부분을 잡고 전체 줄을 이 값으로 바꿉니다. 산출:

TRINITY_DN120587_c0_g1_i1[ID1]

설명하다

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298
              |-|                         |-----------------------|

::의 첫 번째 발생과 두 번째 발생 사이에 있는 ID를 추출하고 싶다고 말합니다.

1단계: 관심 영역 주위에 마커(일반적으로 \n)를 배치합니다.

       s/::/\n/;s//\n/

   This is how the pattern space looks after the above tranformation

              ID1_TRINITY_DN120587_c0_g1\nTRINITY_DN120587_c0_g1_i1\ng.8298::m.8298

2단계: 두 \ns 사이의 ID와 _의 첫 번째 항목 왼쪽에 있는 문자열을 추출합니다.

                    s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
                    ;#  |------|      |---|
                    ;#     \1           \2

   [^_]       => matches any char but an underscore

   [^_]*      => matches 0 or more non underscore char(s)

   \([^_]*\)  => store what was matched into a memory, recallable as \1

   ^\([^_]*\) => anchor your matching from the start of the string

   .*\n       => go upto to the rightmost \n you can see in the string

   \n\(.*\)\n => Ooops!! we see another \n, hence we need to backtrack to
                 the previous \n position and from there start moving right again
                 and stop at the rightmost \n. Whatever is between these positions
                 is the string ID and is recallable as \2. Since the \ns fall outside
                 the \(...\), hence they wouldn't be stored in \2.

   .*         => This is a catchall that we stroll to the end of the string after
                 starting from the rightmost \n position and do nothing with it.

 So our regex engine has matched against the input string it was given in
 the pattern space and was able to store in two memory locations the data
 it was able to gather, viz.: \1 => stores the string portion which is in
 between the beginning of the pattern space and the 1st occurrence of the
 underscore.

 \2 => store the string portion which is in between the 1st and 2nd
       occurrences of :: in the pattern space.

                      \1 = ID1
                      \2 = TRINITY_DN120587_c0_g1_i1

 Now comes the replacement part. Remember that the regex engine was able to scan
 the whole of pattern space from beginning till end, hence the replacement
 will effect the whole of the pattern space.

 \2[\1] => We replace the matched portion of the pattern space (in our case it
           happens to be the entire string) with what has been stored in
           the memory \2 literal [ memory \1 literal ]
           leading to what we see below:

                  TRINITY_DN120587_c0_g1_i1[ID1]

In other words, you have just managed to turn the pattern space from:

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298

into the following:

                  TRINITY_DN120587_c0_g1_i1[ID1]

Question 2

앗해결책:

awk -F'::' '{ print $2"[" substr($1,1,index($1,"_")-1) "]"}' file

산출:

TRINITY_DN120587_c0_g1_i1[ID1]

-F'::'- 필드 구분 기호
substr($1,1,index($1,"_")-1)_- 첫 번째 위치부터 시작하여 첫 번째 항목 (예 ID1: ) 까지 첫 번째 필드에서 부분 문자열을 추출합니다.

Answer

앗해결책:

awk -F'::' '{ print $2"[" substr($1,1,index($1,"_")-1) "]"}' file

산출:

TRINITY_DN120587_c0_g1_i1[ID1]

-F'::'- 필드 구분 기호
substr($1,1,index($1,"_")-1)_- 첫 번째 위치부터 시작하여 첫 번째 항목 (예 ID1: ) 까지 첫 번째 필드에서 부분 문자열을 추출합니다.

Question 3

여기서는 귀하의 패턴이 동일하게 유지되고 이 단일 sed솔루션이 작동할 것이라고 가정합니다.

sed -n "s/^\([^_]*\)_[^:]*::\([^:]*\)::.*/\2\[\1\]/p" filename

출력 예시 입력:

TRINITY_DN120587_c0_g1_i1[ID1]

설명: 줄의 시작 부분부터 내용을 첫 번째 밑줄까지 일치시켜 [^_]*첫 번째 그룹에 저장한 다음 첫 번째와 두 번째 이중 콜론 사이의 두 번째 그룹을 일치시킵니다 [^:]*. 행을 교체하고 원하는 출력 형식과 일치하면 p는 수정된 행을 인쇄합니다.

Answer

여기서는 귀하의 패턴이 동일하게 유지되고 이 단일 sed솔루션이 작동할 것이라고 가정합니다.

sed -n "s/^\([^_]*\)_[^:]*::\([^:]*\)::.*/\2\[\1\]/p" filename

출력 예시 입력:

TRINITY_DN120587_c0_g1_i1[ID1]

설명: 줄의 시작 부분부터 내용을 첫 번째 밑줄까지 일치시켜 [^_]*첫 번째 그룹에 저장한 다음 첫 번째와 두 번째 이중 콜론 사이의 두 번째 그룹을 일치시킵니다 [^:]*. 행을 교체하고 원하는 출력 형식과 일치하면 p는 수정된 행을 인쇄합니다.

설명하다

답변1

설명하다

답변2

답변3

관련 정보