많은 정보가 포함된 대용량 .log 파일이 있는데 그 중 작은 부분만 추출하여 모두 다른 출력 파일에 저장하고 싶습니다.
.log 파일의 몇 가지 예:
.....
New Water Solv 104: solv= 1.635
Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
Water Network Score Contributions:
Total OH2n: -731.606 OH2s: -368.197 CRY: -30.908 ENTR: -94.714 DG: 28.882
Average OH2n: -12.835 OH2s: -6.460 CRY: -0.542 ENTR: -1.662 DG: 0.507
Summary: 28.882 ( -10.345 39.228 )
Saved WATERFLAP_REFINED2_SCORED_OH2s_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_OH2n_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_DRY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CRY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_ENTROPY_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_COMPLEX.pdb
Saved WATERFLAP_REFINED2_SCORED.PDB
Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O_ele.pdb
Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O_ele.pdb
---------------------------
WaterFLAP summary of delta DG between apo and complex
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb
Saved WATERFLAP_Delta_DG_DG_WAT_H2O_ele.pdb
Saved WATERFLAP_Delta_DG_CLASS_H2O.pdb
Saved WATERFLAP_Delta_DG_DG_WAT_H2O.pdb
---------------------------
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
-------------
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
---------------------------
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
---------------------------
WARNING: Setting ATOM parms from HETATM table
Atm: CA Q: 0.08
WARNING: Setting ATOM parms from HETATM table
Atm: CA Q: 0.08
....
파일의 구조는 항상 동일하지만 줄 수, ID, 번호가 변경될 수 있습니다.
이것으로부터 나는 4개의 다른 출력을 얻고 싶습니다. (아마도 " Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
", " Water Network Score Contributions
", " WaterFLAP summary of delta DG between apo and complex
", " Water Network Score Contributions:
"???와 같이 항상 존재하는 상수 "문자열"을 사용할 수도 있습니다)
출력 1: (" Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb
"와 " Water Network Score Contributions
" 사이)
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
출력2(" "와 WaterFLAP summary of delta DG between apo and complex
or로 시작하는 첫 번째 줄 사이)WATER_USED
WATER_BOUNDARY
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
WATER_USED
출력3( 또는 를 포함하는 줄에서 시작하여 WATER_BOUNDARY
앞에서 완료됨 Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb
)
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
출력 4
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
모든 출력은 a이어야 하며 .txt file
열 사이의 구분 기호( space
입력 파일에서 ""로 정의됨)는 입력에서처럼 단순 ","
해야 합니다."space"
나는 무엇을 해야할지 모르겠습니다. 이 가장 어려운 도전을 도와줄 사람이 있나요?
답변1
$ cat tst.awk
/^Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb/ { out = "output" 1; next }
/^Water Network Score Contributions/ { out = ""; next }
/^WaterFLAP summary of delta DG/ { out = "output" 2; next }
/^Water_(USED|BOUNDARY)/ { out = ""; print > ("output" 3) }
/^(Apo|Complex|Net|DD?G)/ { print > ("output" 4) }
out && NF { print > out }
$ awk -f tst.awk file
$ head out*
==> output1 <==
Water: 1 AtId: 3021 ResId: 316 OH2n: -8.922 OH2s: -6.900 CRY: -0.640 ENTR: -0.321 SOLV: 0.000 DG: 6.636 CLASS: 1
Water: 2 AtId: 3013 ResId: 308 OH2n: -8.331 OH2s: -7.364 CRY: -0.885 ENTR: -0.321 SOLV: 0.000 DG: 6.453 CLASS: 1
Water: 3 AtId: 3009 ResId: 304 OH2n: -7.424 OH2s: -7.321 CRY: 5.000 ENTR: -0.036 SOLV: 0.577 DG: 5.450 CLASS: 1
Water: 4 AtId: 3064 ResId: 359 OH2n: -9.779 OH2s: -8.778 CRY: -1.187 ENTR: -0.804 SOLV: 0.000 DG: 3.279 CLASS: 1
Water: 103 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 SOLV: 0.962 DG: -0.849 CLASS: 5
Water: 104 AtId: 3004 ResId: 299 OH2n: -14.237 OH2s: -11.215 CRY: -1.197 ENTR: -0.500 SOLV: 1.635 DG: -1.185 CLASS: 5
==> output2 <==
Water: 1 AtId: 2994 ResId: 289 DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290 DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291 DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438 DG_APO: 0.000 DG_COMPLEX: 0.000 DDG: 0.000 CLASS: 3
==> output3 <==
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983 OH2s: -15.953 CRY: -1.934 ENTR: -0.250 DG: -7.026 CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808 OH2s: -11.344 CRY: -0.291 ENTR: -0.411 DG: -2.014 CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725 OH2s: -10.556 CRY: -1.060 ENTR: -0.607 DG: -0.849 CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971 OH2s: -14.678 CRY: -2.085 ENTR: -0.375 DG: -4.170 CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438 OH2n: 5.000 OH2s: -0.110 CRY: -0.064 ENTR: -4.875 DG: 0.000 CLASS: 5
==> output4 <==
Apo: DG: 35.441 DH: -3.791 -TDS: 39.232
Complex: DG: 28.882 DH: -10.345 -TDS: 39.228
Net: DG: -6.559 DH: -6.555 -TDS: -0.004
DG Displaced: 13.760
DDG Perturbed: 6.605
DG Disp-Pert: 7.155
답변2
grep "여기에 문자열" file.log
예를 들어:
grep "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb 저장됨" big.log
키워드 위 또는 아래에 줄을 추가하려는 경우
-A NUM은 이후를 의미합니다. -B NUM은 이전을 의미합니다. -C NUM은 이전과 이후를 의미합니다.
파일로 이동하려면 ">"를 사용하여 stdout을 txt 파일로 출력하세요.
예를 들어:
grep -A 5 "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb를 저장했습니다" big.log > text.txt