10GB보다 큰 로그 파일을 보는 방법은 무엇입니까?

Question 1

이 스크립트는 텍스트 파일을 주어진 수의 부분으로 분할하여 텍스트 줄이 여러 부분으로 분할되는 것을 방지합니다. 한 번에 한 섹션에만 충분한 공간이 있는 경우에 사용할 수 있습니다. 끝부터 시작하여 소스 파일의 일부를 복사한 다음 소스 파일을 잘라서 공간을 확보하는 방식으로 작동합니다. 따라서 1.8GB 파일과 0.5GB의 여유 공간이 있는 경우 4개의 섹션을 사용해야 합니다(또는 출력 파일을 더 작게 만들려면 그 이상). 마지막 부분은 복사할 필요가 없으므로 이름만 바꾸면 됩니다. 분할 후에는 소스 파일이 더 이상 존재하지 않습니다(어쨌든 공간이 없습니다).

주요 부분은 섹션 크기만 설정하는 awk 스크립트(Bash로 포장됨)입니다(뉴라인과 일치하도록 섹션 조정 포함). system() 함수를 사용하여 dd, truncate 및 mv를 호출하여 모든 무거운 작업을 수행합니다.

$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ dd --version
dd (coreutils) 8.28
$ truncate --version
truncate (GNU coreutils) 8.28

스크립트는 1~4개의 매개변수를 사용합니다.

./splitBig Source nSect Dest Debug

Source: is the filename of the file to be split into sections.

nSect: is the number of sections required (default 10).

Dest: is a printf() format used to generate the names of the sections.
Default is Source.%.3d, which appends serial numbers (from .001 up) to the source name.
Section numbers correspond to the original order of the source file.

Debug: generates some diagnostics (default is none).

시험 결과:

$ mkdir TestDir
$ cd TestDir
$ 
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:54 leipzig1M.txt
$ 
$ time ../splitBig leipzig1M.txt 5

real    0m0.780s
user    0m0.045s
sys 0m0.727s
$ ls -s -l
total 126620
25324 -rw-rw-r-- 1 paul paul 25928991 Aug 27 15:56 leipzig1M.txt.001
25324 -rw-rw-r-- 1 paul paul 25929019 Aug 27 15:56 leipzig1M.txt.002
25324 -rw-rw-r-- 1 paul paul 25928954 Aug 27 15:56 leipzig1M.txt.003
25324 -rw-rw-r-- 1 paul paul 25928977 Aug 27 15:56 leipzig1M.txt.004
25324 -rw-rw-r-- 1 paul paul 25928856 Aug 27 15:56 leipzig1M.txt.005
$ 
$ rm lei*
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:57 leipzig1M.txt
$ time ../splitBig leipzig1M.txt 3 "Tuesday.%1d.log" 1
.... Section   3 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=86429864 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=86430023 iflag=skip_bytes of="Tuesday.3.log" status=none
#.. system: truncate -s 86430023 "leipzig1M.txt"
.... Section   2 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=43214932 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=43214997 iflag=skip_bytes of="Tuesday.2.log" status=none
#.. system: truncate -s 43214997 "leipzig1M.txt"
.... Section   1 ....
#.. system: mv "leipzig1M.txt" "Tuesday.1.log"

real    0m0.628s
user    0m0.025s
sys 0m0.591s
$ ls -s -l
total 126612
42204 -rw-rw-r-- 1 paul paul 43214997 Aug 27 15:58 Tuesday.1.log
42204 -rw-rw-r-- 1 paul paul 43215026 Aug 27 15:58 Tuesday.2.log
42204 -rw-rw-r-- 1 paul paul 43214774 Aug 27 15:58 Tuesday.3.log
$

스크립트:

#! /bin/bash --

LC_ALL="C"

splitFile () {  #:: (inFile, Pieces, outFmt, Debug)

    local inFile="${1}" Pieces="${2}" outFmt="${3}" Debug="${4}"

    local Awk='
BEGIN {
    SQ = "\042"; szLine = 8192; szFile = "128M";
    fmtLine = "dd bs=%d count=1 if=%s skip=%d iflag=skip_bytes status=none";
    fmtFile = "dd bs=%s if=%s skip=%d iflag=skip_bytes of=%s status=none";
    fmtClip = "truncate -s %d %s";
    fmtName = "mv %s %s";
}

function findNl (fIn, Seek, Local, cmd, lth, txt) {

    cmd = sprintf (fmtLine, szLine, SQ fIn SQ, Seek);
    if (Db) printf ("#.. findNl: %s\n", cmd);
    cmd | getline txt; close (cmd);
    lth = length (txt);
    if (lth == szLine) printf ("#### Line at %d will be split\n", Seek);
    return ((lth == szLine) ? Seek : Seek + lth + 1);
}

function Split (fIn, Size, Pieces, fmtOut, Local, n, seek, cmd) {

    for (n = Pieces; n > 1; n--) {
        if (Db) printf (".... Section %3d ....\n", n);
        seek = int (Size * ((n - 1) / Pieces));
        seek = findNl( fIn, seek);
        cmd = sprintf (fmtFile, szFile, SQ fIn SQ, seek,
            SQ sprintf (outFmt, n) SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
        cmd = sprintf (fmtClip, seek, SQ fIn SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
    }
    if (Db) printf (".... Section %3d ....\n", n);
    cmd = sprintf (fmtName, SQ fIn SQ, SQ sprintf (outFmt, n) SQ);
    if (Db) printf ("#.. system: %s\n", cmd);
    system (cmd);
}

{ Split( inFile, $1, Pieces, outFmt); }
'
    stat -L -c "%s" "${inFile}" | awk -v inFile="${inFile}" \
        -v Pieces="${Pieces}" -v outFmt="${outFmt}" \
        -v Db="${Debug}" -f <( printf '%s' "${Awk}" )
}

#### Script body starts here.

    splitFile "${1}" "${2:-10}" "${3:-${1}.%.3d}" "${4}"

Answer

이 스크립트는 텍스트 파일을 주어진 수의 부분으로 분할하여 텍스트 줄이 여러 부분으로 분할되는 것을 방지합니다. 한 번에 한 섹션에만 충분한 공간이 있는 경우에 사용할 수 있습니다. 끝부터 시작하여 소스 파일의 일부를 복사한 다음 소스 파일을 잘라서 공간을 확보하는 방식으로 작동합니다. 따라서 1.8GB 파일과 0.5GB의 여유 공간이 있는 경우 4개의 섹션을 사용해야 합니다(또는 출력 파일을 더 작게 만들려면 그 이상). 마지막 부분은 복사할 필요가 없으므로 이름만 바꾸면 됩니다. 분할 후에는 소스 파일이 더 이상 존재하지 않습니다(어쨌든 공간이 없습니다).

주요 부분은 섹션 크기만 설정하는 awk 스크립트(Bash로 포장됨)입니다(뉴라인과 일치하도록 섹션 조정 포함). system() 함수를 사용하여 dd, truncate 및 mv를 호출하여 모든 무거운 작업을 수행합니다.

$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ dd --version
dd (coreutils) 8.28
$ truncate --version
truncate (GNU coreutils) 8.28

스크립트는 1~4개의 매개변수를 사용합니다.

./splitBig Source nSect Dest Debug

Source: is the filename of the file to be split into sections.

nSect: is the number of sections required (default 10).

Dest: is a printf() format used to generate the names of the sections.
Default is Source.%.3d, which appends serial numbers (from .001 up) to the source name.
Section numbers correspond to the original order of the source file.

Debug: generates some diagnostics (default is none).

시험 결과:

$ mkdir TestDir
$ cd TestDir
$ 
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:54 leipzig1M.txt
$ 
$ time ../splitBig leipzig1M.txt 5

real    0m0.780s
user    0m0.045s
sys 0m0.727s
$ ls -s -l
total 126620
25324 -rw-rw-r-- 1 paul paul 25928991 Aug 27 15:56 leipzig1M.txt.001
25324 -rw-rw-r-- 1 paul paul 25929019 Aug 27 15:56 leipzig1M.txt.002
25324 -rw-rw-r-- 1 paul paul 25928954 Aug 27 15:56 leipzig1M.txt.003
25324 -rw-rw-r-- 1 paul paul 25928977 Aug 27 15:56 leipzig1M.txt.004
25324 -rw-rw-r-- 1 paul paul 25928856 Aug 27 15:56 leipzig1M.txt.005
$ 
$ rm lei*
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:57 leipzig1M.txt
$ time ../splitBig leipzig1M.txt 3 "Tuesday.%1d.log" 1
.... Section   3 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=86429864 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=86430023 iflag=skip_bytes of="Tuesday.3.log" status=none
#.. system: truncate -s 86430023 "leipzig1M.txt"
.... Section   2 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=43214932 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=43214997 iflag=skip_bytes of="Tuesday.2.log" status=none
#.. system: truncate -s 43214997 "leipzig1M.txt"
.... Section   1 ....
#.. system: mv "leipzig1M.txt" "Tuesday.1.log"

real    0m0.628s
user    0m0.025s
sys 0m0.591s
$ ls -s -l
total 126612
42204 -rw-rw-r-- 1 paul paul 43214997 Aug 27 15:58 Tuesday.1.log
42204 -rw-rw-r-- 1 paul paul 43215026 Aug 27 15:58 Tuesday.2.log
42204 -rw-rw-r-- 1 paul paul 43214774 Aug 27 15:58 Tuesday.3.log
$

스크립트:

#! /bin/bash --

LC_ALL="C"

splitFile () {  #:: (inFile, Pieces, outFmt, Debug)

    local inFile="${1}" Pieces="${2}" outFmt="${3}" Debug="${4}"

    local Awk='
BEGIN {
    SQ = "\042"; szLine = 8192; szFile = "128M";
    fmtLine = "dd bs=%d count=1 if=%s skip=%d iflag=skip_bytes status=none";
    fmtFile = "dd bs=%s if=%s skip=%d iflag=skip_bytes of=%s status=none";
    fmtClip = "truncate -s %d %s";
    fmtName = "mv %s %s";
}

function findNl (fIn, Seek, Local, cmd, lth, txt) {

    cmd = sprintf (fmtLine, szLine, SQ fIn SQ, Seek);
    if (Db) printf ("#.. findNl: %s\n", cmd);
    cmd | getline txt; close (cmd);
    lth = length (txt);
    if (lth == szLine) printf ("#### Line at %d will be split\n", Seek);
    return ((lth == szLine) ? Seek : Seek + lth + 1);
}

function Split (fIn, Size, Pieces, fmtOut, Local, n, seek, cmd) {

    for (n = Pieces; n > 1; n--) {
        if (Db) printf (".... Section %3d ....\n", n);
        seek = int (Size * ((n - 1) / Pieces));
        seek = findNl( fIn, seek);
        cmd = sprintf (fmtFile, szFile, SQ fIn SQ, seek,
            SQ sprintf (outFmt, n) SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
        cmd = sprintf (fmtClip, seek, SQ fIn SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
    }
    if (Db) printf (".... Section %3d ....\n", n);
    cmd = sprintf (fmtName, SQ fIn SQ, SQ sprintf (outFmt, n) SQ);
    if (Db) printf ("#.. system: %s\n", cmd);
    system (cmd);
}

{ Split( inFile, $1, Pieces, outFmt); }
'
    stat -L -c "%s" "${inFile}" | awk -v inFile="${inFile}" \
        -v Pieces="${Pieces}" -v outFmt="${outFmt}" \
        -v Db="${Debug}" -f <( printf '%s' "${Awk}" )
}

#### Script body starts here.

    splitFile "${1}" "${2:-10}" "${3:-${1}.%.3d}" "${4}"

Question 2

당신이 무엇을 달성하고 싶은지 전혀 명확하지 않습니다. 제가 아는 한, 질문하신 내용은 질문 제목에 '10GB보다 큰 로그 파일을 보는 방법은 무엇입니까?“그리고 질문 자체에는 타당하다고 생각하는 자신의 생각과 아이디어만 포함되어 있습니다.

그래서 내가 찾을 수 있는 유일한 질문에 대답하기 위해 한 가지 옵션은 호출기를 사용하는 것입니다.less

less 10GBlogfile

문서( man less)에는 최종적으로 사용할 수 있는 키가 나열되어 있으며, 문서를 시작한 후에는 이를 사용하여 h도움말(키 및 관련 작업 목록)을 얻을 수 있습니다. 처음에는 G마지막 줄로 이동하고 커서 키(PageUp, PageDown 포함)가 이동하고 /RE 문자열을 검색하며 n/ N는 다음/이전 일치 항목을 검색 q하고 호출기를 종료합니다.

Answer

당신이 무엇을 달성하고 싶은지 전혀 명확하지 않습니다. 제가 아는 한, 질문하신 내용은 질문 제목에 '10GB보다 큰 로그 파일을 보는 방법은 무엇입니까?“그리고 질문 자체에는 타당하다고 생각하는 자신의 생각과 아이디어만 포함되어 있습니다.

그래서 내가 찾을 수 있는 유일한 질문에 대답하기 위해 한 가지 옵션은 호출기를 사용하는 것입니다.less

less 10GBlogfile

문서( man less)에는 최종적으로 사용할 수 있는 키가 나열되어 있으며, 문서를 시작한 후에는 이를 사용하여 h도움말(키 및 관련 작업 목록)을 얻을 수 있습니다. 처음에는 G마지막 줄로 이동하고 커서 키(PageUp, PageDown 포함)가 이동하고 /RE 문자열을 검색하며 n/ N는 다음/이전 일치 항목을 검색 q하고 호출기를 종료합니다.

Question 3

이미 시도해 보셨겠지만 split -n 20 이에 대해 생각해 본 적이 있습니까? split -n 20 --filter 'grep <whatever> or something' 이렇게 하면 원본 파일이 구성 요소로 분할되어 원하는 명령에 개별적으로 파이프됩니다.

분할 비용은 그다지 높지 않아야 합니다. 특히 분할하는 경우에는 split --bytes=100M기본적으로 단 하나 seek이고 읽기/쓰기입니다. 그러나 UTF-8과 같은 가변 길이 인코딩을 어떻게 처리하는지 잘 모르겠습니다. 데이터가 ASCII라는 것을 알고 있다면 완벽하게 안전합니다. 그렇지 않으면 유사한 작업을 수행하는 것이 더 나을 것입니다 split --line-bytes=<size>. 하지만 이렇게 하면 더 많은 데이터를 구문 분석해야 하므로 비용이 많이 들 수 있습니다.

Answer

이미 시도해 보셨겠지만 split -n 20 이에 대해 생각해 본 적이 있습니까? split -n 20 --filter 'grep <whatever> or something' 이렇게 하면 원본 파일이 구성 요소로 분할되어 원하는 명령에 개별적으로 파이프됩니다.

분할 비용은 그다지 높지 않아야 합니다. 특히 분할하는 경우에는 split --bytes=100M기본적으로 단 하나 seek이고 읽기/쓰기입니다. 그러나 UTF-8과 같은 가변 길이 인코딩을 어떻게 처리하는지 잘 모르겠습니다. 데이터가 ASCII라는 것을 알고 있다면 완벽하게 안전합니다. 그렇지 않으면 유사한 작업을 수행하는 것이 더 나을 것입니다 split --line-bytes=<size>. 하지만 이렇게 하면 더 많은 데이터를 구문 분석해야 하므로 비용이 많이 들 수 있습니다.

Question 4

로그에서 오류를 추출하고 분석하려는 것 같습니다. 이 질문에 대한 보편적인 대답은 없습니다. 로그 파일의 특정 패턴과 관련된 이벤트를 격리하는 방법은 전적으로 로그 파일의 구조와 이를 생성한 항목의 특성에 따라 달라집니다.

특정 사용자의 로그가 얼마나 오래 지속되는지 알 수 없습니다.

명시적(사용자 이름) 또는 암시적(세션 ID, 프로세스 ID, IP 주소) 식별자가 있습니까? 그렇지 않은 경우 하나가 필요한 것처럼 들리며 로그 파일을 여러 번 반복하여 다음을 수행해야 합니다.

오류 인스턴스, 타임스탬프, 사용자 식별자 식별
오류가 아닌 주변 이벤트 캡처

Answer

로그에서 오류를 추출하고 분석하려는 것 같습니다. 이 질문에 대한 보편적인 대답은 없습니다. 로그 파일의 특정 패턴과 관련된 이벤트를 격리하는 방법은 전적으로 로그 파일의 구조와 이를 생성한 항목의 특성에 따라 달라집니다.

특정 사용자의 로그가 얼마나 오래 지속되는지 알 수 없습니다.

명시적(사용자 이름) 또는 암시적(세션 ID, 프로세스 ID, IP 주소) 식별자가 있습니까? 그렇지 않은 경우 하나가 필요한 것처럼 들리며 로그 파일을 여러 번 반복하여 다음을 수행해야 합니다.

오류 인스턴스, 타임스탬프, 사용자 식별자 식별
오류가 아닌 주변 이벤트 캡처

10GB보다 큰 로그 파일을 보는 방법은 무엇입니까?

답변1

답변2

답변3

답변4

관련 정보