다중 필드 추출

다중 필드 추출

여러 줄이 포함된 파일에서 필드를 추출해 보세요. 예를 들면 다음과 같습니다.

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Client Login Packet"; flowbits:isset,ET.gadu.welcome; flow:established,to_server; dsize:<50; content:"|15 00 00 00|"; depth:4; flowbits:set,ET.gadu.loginsent; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008298; classtype:policy-violation; sid:2008298; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp any [21,25,110,143,443,465,587,636,989:995,5061,5222] -> $HOME_NET any (msg:"ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204)"; flow:established,from_server; content:"|16 03|"; depth:2; byte_test:1,<,4,0,relative; content:"|02|"; distance:3; within:1; byte_jump:1,37,relative; content:"|00 19|"; within:2; fast_pattern; threshold:type limit,track by_dst,count 1,seconds 1200; reference:url,blog.cryptographyengineering.com/2015/03/attack-of-week-freak-or-factoring-nsa.html; reference:cve,2015-0204; reference:cve,2015-1637; classtype:bad-unknown; sid:2020661; rev:3; metadata:created_at 2015_03_10, updated_at 2015_03_10;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Send Message"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|0b 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008302; classtype:policy-violation; sid:2008302; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $EXTERNAL_NET 8074 -> $HOME_NET any (msg:"ET CHAT GaduGadu Chat Receive Message"; flowbits:isset,ET.gadu.loggedin; flow:established,from_server; content:"|0a 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008303; classtype:policy-violation; sid:2008303; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Keepalive PING"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|08 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008304; classtype:policy-violation; sid:2008304; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ET EXPLOIT CVE-2016-0189 Common Construct M2"; flow:established,from_server; file_data; content:"triggerBug"; nocase; content:"Dim "; nocase; distance:0; content:".resize"; nocase; pcre:"/^\s*\x28/Rs";  content:"Mid"; pcre:"/^\s*?\(x\s*,\s*1,\s*24000\s*\x29/Rs"; reference:url,theori.io/research/cve-2016-0189; reference:cve,2016-0189; classtype:attempted-user; sid:2022972; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2016_07_15, performance_impact Low, updated_at 2016_07_15;)

sid개별 필드를 추출할 수 있지만 예를 들어 , 및 의 msg내용을 추출하여 classtype쉼표로 구분된 줄에 나열하고 파일의 다른 줄에 대해 동일한 작업을 수행하는 방법을 모르겠습니다 .metadata:created_atupdated_at

첫 번째 항목을 기준으로 예상되는 출력:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30

created_at항상 updated_at나중에 표시되지만 metadata다른 위치/순서로 표시될 수도 있습니다.

GNU/Linux의 Bash에서 실행됩니다.

답변1

원하는 출력을 얻기 위한 간단한 스크립트:

#!/usr/bin/env bash

# Assumptions: the file name is always passed, and points to a valid file,
# hence no error handling has been implemented. (for script simplicity)

# let the first argument to the script be the file name.
filename="$1"

# read one line at a time, extracting the required fields
while read -r line
do
    # skip blank lines
    if [[ ${#line} -gt 0 ]]; then
        sid=$(echo "$line"|grep -o 'sid[^;]*'| awk -F ':' '{print $2}')
        msg=$(echo "$line"|grep -o 'msg:[^;]*'| awk -F '"' '{print $2}')
        classType=$(echo "$line"|grep -o 'classtype:[^;]*'| awk -F ':' '{print $2}')
        cDate=$(echo "$line"|grep -o "created_at[^,]*"|awk '{print $2}')
        uDate=$(echo "$line"|grep -o "updated_at[^';']*"|awk '{print $2}')

        echo "$sid,$msg,$classType,$cDate,$uDate"
    fi
done < "$filename"

스크립트를 실행합니다:

./scriptName fileName

산출:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204),bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

답변2

GNU awk를 사용하여 FPAT 작업을 수행하는 일반적인 방법은 다음과 같습니다.

$ cat tst.awk
BEGIN {
    FPAT="[[:alnum:]_]+:(\"[^\"]+\"|[^;]+)"
    OFS = ","
}
{
    delete f
    for (i=1; i<=NF; i++) {
        tag = val = $i
        sub(/:.*/,"",tag)
        sub(/[^:]+:/,"",val)
        gsub(/"/,"",val)
        f[tag] = val
        if ( tag == "metadata" ) {
            numSubFlds = split(val,md,/, */)
            for (j=1; j<=numSubFlds; j++) {
                subTag = subVal = md[j]
                sub(/ .*/,"",subTag)
                sub(/[^ ]+ /,"",subVal)
                f[tag":"subTag] = subVal
            }
        }
    }

    # uncomment this to see all tags and values
    # for (idx in f) { print idx "=" f[idx] }

    # print
    print f["sid"], f["msg"], f["classtype"], f["metadata:created_at"], f["metadata:updated_at"]
}

.

$ gawk -f tst.awk file
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,,bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

두 번째 입력 라인의 형식이 다른 입력 라인과 다르게 지정되어 출력이 다른 것 같습니다.

관련 정보