하드 .log 파일에서 정보 추출

하드 .log 파일에서 정보 추출

많은 정보가 포함된 대용량 .log 파일이 있는데 그 중 작은 부분만 추출하여 모두 다른 출력 파일에 저장하고 싶습니다.

.log 파일의 몇 가지 예:

.....
New Water Solv 104: solv=  1.635

Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb


Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

Water Network Score Contributions:

Total       OH2n: -731.606  OH2s: -368.197  CRY: -30.908    ENTR: -94.714   DG:  28.882
Average     OH2n: -12.835   OH2s: -6.460    CRY: -0.542 ENTR: -1.662    DG:  0.507
Summary:     28.882 ( -10.345  39.228 )


Saved WATERFLAP_REFINED2_SCORED_OH2s_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_OH2n_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_ENTROPY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_COMPLEX.pdb

Saved WATERFLAP_REFINED2_SCORED.PDB

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O_ele.pdb

---------------------------
WaterFLAP summary of delta DG between apo and complex

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_CLASS_H2O.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O.pdb


---------------------------

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228

-------------
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004


---------------------------
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

---------------------------

WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
....

파일의 구조는 항상 동일하지만 줄 수, ID, 번호가 변경될 수 있습니다.

이것으로부터 나는 4개의 다른 출력을 얻고 싶습니다. (아마도 " Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb", " Water Network Score Contributions", " WaterFLAP summary of delta DG between apo and complex", " Water Network Score Contributions:"???와 같이 항상 존재하는 상수 "문자열"을 사용할 수도 있습니다)

출력 1: (" Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb"와 " Water Network Score Contributions" 사이)

Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

출력2(" "와 WaterFLAP summary of delta DG between apo and complexor로 시작하는 첫 번째 줄 사이)WATER_USEDWATER_BOUNDARY

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

WATER_USED출력3( 또는 를 포함하는 줄에서 시작하여 WATER_BOUNDARY앞에서 완료됨 Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb)

Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5
   

출력 4

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

모든 출력은 a이어야 하며 .txt file열 사이의 구분 기호( space입력 파일에서 ""로 정의됨)는 입력에서처럼 단순 "," 해야 합니다."space"

나는 무엇을 해야할지 모르겠습니다. 이 가장 어려운 도전을 도와줄 사람이 있나요?

답변1

$ cat tst.awk
/^Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb/ { out = "output" 1; next }
/^Water Network Score Contributions/            { out = ""; next }
/^WaterFLAP summary of delta DG/                { out = "output" 2; next }
/^Water_(USED|BOUNDARY)/                        { out = ""; print > ("output" 3) }
/^(Apo|Complex|Net|DD?G)/                       { print > ("output" 4) }
out && NF { print > out }

$ awk -f tst.awk file

$ head out*
==> output1 <==
Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

==> output2 <==
Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

==> output3 <==
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

==> output4 <==
Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

답변2

grep "여기에 문자열" file.log

예를 들어:

grep "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb 저장됨" big.log

키워드 위 또는 아래에 줄을 추가하려는 경우

-A NUM은 이후를 의미합니다. -B NUM은 이전을 의미합니다. -C NUM은 이전과 이후를 의미합니다.

파일로 이동하려면 ">"를 사용하여 stdout을 txt 파일로 출력하세요.

예를 들어:

grep -A 5 "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb를 저장했습니다" big.log > text.txt

관련 정보