특정 크기의 파일에 대해 반복적으로 압축된 아카이브를 검색합니다.

Question 1

먼저 설치하세요AVFS, 아카이브 내부에 투명한 액세스를 제공하고 명령을 실행하는 파일 시스템입니다 mountavfs. 바라보다압축된 아카이브를 재귀적으로 grep하는 방법은 무엇입니까?배경.

그 이후에는 /path/to/archive.zip인식된 아카이브인 경우 ~/.avfs/path/to/archive.zip#해당 아카이브의 내용을 포함하는 것으로 보이는 디렉터리입니다.

has_large_file_rec인수로 전달된 zip 파일 내에서 큰 XML 파일을 찾고 포함된 각 zip 파일에서 자신을 재귀적으로 호출하는 도우미 스크립트를 작성합니다 . 스크립트가 큰 XML 파일을 찾으면 일부 출력이 생성됩니다. 큰 XML 파일을 찾으면 검색을 중지할 수 있으므로 효율성을 위해 출력이 잘립니다.

#!/bin/sh
## auxiliary script has_large_file_rec
find "$1#" -name '*.zip' -type f -exec has_large_file_rec {} \; \
        -o -name '*.xml' -type f -size +1024k -print | head -n 1

최상위 수준에서 대용량 파일이 발견되면 해당 파일을 대용량 파일 디렉터리로 이동합니다.

find "~/.avfs$PWD" \
  -name '*.zip' -sh -c '
      a=$(has_large_file_rec "$0")
      if [ -n "$a" ]; then mv "$0" ~/big-files/; fi
                       ' {} \; -o \
  -name '*.xml' -type f -size +1024k -exec mv {} ~/big-files/ \;

Answer

먼저 설치하세요AVFS, 아카이브 내부에 투명한 액세스를 제공하고 명령을 실행하는 파일 시스템입니다 mountavfs. 바라보다압축된 아카이브를 재귀적으로 grep하는 방법은 무엇입니까?배경.

그 이후에는 /path/to/archive.zip인식된 아카이브인 경우 ~/.avfs/path/to/archive.zip#해당 아카이브의 내용을 포함하는 것으로 보이는 디렉터리입니다.

has_large_file_rec인수로 전달된 zip 파일 내에서 큰 XML 파일을 찾고 포함된 각 zip 파일에서 자신을 재귀적으로 호출하는 도우미 스크립트를 작성합니다 . 스크립트가 큰 XML 파일을 찾으면 일부 출력이 생성됩니다. 큰 XML 파일을 찾으면 검색을 중지할 수 있으므로 효율성을 위해 출력이 잘립니다.

#!/bin/sh
## auxiliary script has_large_file_rec
find "$1#" -name '*.zip' -type f -exec has_large_file_rec {} \; \
        -o -name '*.xml' -type f -size +1024k -print | head -n 1

최상위 수준에서 대용량 파일이 발견되면 해당 파일을 대용량 파일 디렉터리로 이동합니다.

find "~/.avfs$PWD" \
  -name '*.zip' -sh -c '
      a=$(has_large_file_rec "$0")
      if [ -n "$a" ]; then mv "$0" ~/big-files/; fi
                       ' {} \; -o \
  -name '*.xml' -type f -size +1024k -exec mv {} ~/big-files/ \;

Question 2

한 가지 방법은 을 사용하는 것입니다 perl.

콘텐츠 script.pl:

use warnings;
use strict;
use Archive::Extract;
use List::Util qw|first|;
use File::Copy qw|move|;
use File::Spec;
use File::Path qw|remove_tree|;

## Path to save 'xml' and 'zip' files.
my $big_files_dir = qq|$ENV{HOME}/big_files/|;

## Temp dir to extract files of 'zips'.
my $zips_path = qq|/tmp/zips$$/|;

## Size in bytes to check 'xml' files.
my $file_max_size_bytes = 100 * 1024 * 1024;

my (@zips_to_move, $orig_zip);

## Get files to process.
my @files = <*.xml *.zip>;                                                                                                                                                                                                                   

## From previous list, copy 'xml' files bigger than size limit.                                                                                                                                                                              
for my $file ( @files ) {                                                                                                                                                                                                                    
        if ( substr( $file, -4 ) eq q|.xml| and -s $file > $file_max_size_bytes ) {                                                                                                                                                          
                move $file, $big_files_dir;                                                                                                                                                                                                  
        }                                                                                                                                                                                                                                    
}                                                                                                                                                                                                                                            

## Process now 'zip' files. For each one remove temp dir to avoid mixing files                                                                                                                                                               
## from different 'zip' files.                                                                                                                                                                                                               
for ( grep { m/\.zip\Z/ } @files ) {                                                                                                                                                                                                         
        remove_tree $zips_path;                                                                                                                                                                                                              
        $orig_zip = $_;                                                                                                                                                                                                                      
        handle_zip_file( $orig_zip );                                                                                                                                                                                                        
}                                                                                                                                                                                                                                            

## Copy 'zip' files got until now.                                                                                                                                                                                                           
for my $zip_file ( @zips_to_move ) {                                                                                                                                                                                                         
        move $zip_file, $big_files_dir;                                                                                                                                                                                                      
}                                                                                                                                                                                                                                            

## Traverse recursively each 'zip file. It will look for 'zip' file in the                                                                                                                                                                   
## subtree and will extract all 'xml' files to a temp dir. Base case is when                                                                                                                                                                 
## a 'zip' file only contains 'xml' files, then I will read size of all 'xmls'                                                                                                                                                               
## and will copy the 'zip' if at least one of them if bigger than the size limit.                                                                                                                                                            
## To avoid an infinite loop searching into 'zip' files, I delete them just after                                                                                                                                                            
## the extraction of its content.                                                                                                                                                                                                            
sub handle_zip_file {                                                                                                                                                                                                                        
        my ($file) = @_;                                                                                                                                                                                                                     

        my $ae = Archive::Extract->new(                                                                                                                                                                                                      
                archive => $file,                                                                                                                                                                                                            
                type => q|zip|,                                                                                                                                                                                                              
        );                                                                                                                                                                                                                                   

        $ae->extract(
                to => $zips_path,
        );

        ## Don't check fails. I don't worry about them, ¿perhaps should I?
        unlink( File::Spec->catfile( 
                                (File::Spec->splitpath( $zips_path ))[1], 
                                (File::Spec->splitpath( $file ))[2],
                        )
        );

        my $zip = first { substr( $_, -4 ) eq q|.zip| } <$zips_path/*>;
        if ( ! $zip ) {
                for my $f ( <$zips_path/*.xml> ) {
                        if ( substr( $f, -4 ) eq q|.xml| and -s $f > $file_max_size_bytes ) {
                                push @zips_to_move, $orig_zip;
                                last;
                        }
                }
                return;
        }

        handle_zip_file( $zip );
}

몇 가지 문제:

xmlzip임시 디렉터리에 복사하면 파일 하위 트리에 있는 동일한 이름의 파일을 덮어쓰게 됩니다.
프로그램은 동일한 트리에 있는 모든 zip 파일의 내용을 추출한 다음 해당 파일이 xml100MB보다 큰지 확인합니다. zip 파일의 압축을 풀 때마다 확인하는 것이 더 빠릅니다. 개선될 수 있습니다.
여러 번 처리되는 zip 파일은 캐시되지 않습니다.
~/big_files존재하고 쓰기 가능해야 합니다.
스크립트는 매개변수를 허용하지 않습니다. zip및 파일과 동일한 디렉토리에서 실행 해야 합니다 xml.

이전 포인트에서 볼 수 있듯이 완벽하지는 않지만 테스트에서는 작동했습니다. 나는 그것이 당신에게 유용하기를 바랍니다.

다음과 같이 실행하세요:

perl script.pl

Answer