모든 "aws s3" 다운로드가 완료되면 어떻게 알림을 받습니까?

Question 1

마지막으로, 주어진 시간에 10개의 동시 s3 다운로드만 발생하도록 Python 스크립트를 작성했습니다.

#!/usr/bin/env python3
import os
import sys
import boto3
from multiprocessing import Pool


BUCKET = "my-bucket"

s3 = boto3.client("s3")


def download_s3_file(params):
    """ If the files exists, assume download is already performed and done
    """
    src, dest = params
    if os.path.exists(dest) and os.path.isfile(dest):
        print(f"The file {dest} is already downloaded ")
        return
    print("Downloading", BUCKET, src, dest)
    print("process id:", os.getpid())
    try:
        s3.download_file(BUCKET, src, dest)
    except Exception as e:
        print(e)


def main():
    filelist = sys.argv[1]
    print("parent process:", os.getpid())
    print("Working on ", filelist)
    jobs = []
    for l in open(filelist, "r"):
        # Ignore commented lines
        if not l.startswith("#"):
            src, dest = l.strip().split(",")
            jobs.append((src, dest,))
    with Pool(10) as p:
        p.map(download_s3_file, jobs)


if __name__ == "__main__":
    main()

Answer

마지막으로, 주어진 시간에 10개의 동시 s3 다운로드만 발생하도록 Python 스크립트를 작성했습니다.

#!/usr/bin/env python3
import os
import sys
import boto3
from multiprocessing import Pool


BUCKET = "my-bucket"

s3 = boto3.client("s3")


def download_s3_file(params):
    """ If the files exists, assume download is already performed and done
    """
    src, dest = params
    if os.path.exists(dest) and os.path.isfile(dest):
        print(f"The file {dest} is already downloaded ")
        return
    print("Downloading", BUCKET, src, dest)
    print("process id:", os.getpid())
    try:
        s3.download_file(BUCKET, src, dest)
    except Exception as e:
        print(e)


def main():
    filelist = sys.argv[1]
    print("parent process:", os.getpid())
    print("Working on ", filelist)
    jobs = []
    for l in open(filelist, "r"):
        # Ignore commented lines
        if not l.startswith("#"):
            src, dest = l.strip().split(",")
            jobs.append((src, dest,))
    with Pool(10) as p:
        p.map(download_s3_file, jobs)


if __name__ == "__main__":
    main()

Question 2

당신이 그렇게한다면그들 중 다수귀하의 지역 상자는 시작하기 때문에 빠르게 과부하가 걸릴 것입니다.많은 프로세스동시에.

다음 중 하나를 수행하는 것이 가장 좋습니다.

파일에 공통 접두사가 있으면 재귀 복사를 수행하십시오.
```
aws s3 cp --recursive s3://my-bucket/path/ .
```

창의적으로 aws s3 cp --exclude사용 --include- 즉 포함 목록에 지정된 항목을 제외한 모든 항목을 제외합니다.

aws s3 cp --recursive --exclude '*' \
          --include 'path1/file1.txt' --include 'path2/file2.txt' \
          s3://my-bucket/ .

사용s3cmd --include-from file.txt원하는 파일 이름을 입력 파일에 넣을 수 있습니다.

~ $ cat include-filenames.txt
path1/file1.txt
path2/file2.txt

~ $ s3cmd get --recursive --exclude '*' \
              --include-from include-filenames.txt \
              s3://my-bucket/ .

아니요, AWS는 이를 모니터링할 수 있는 방법을 제공하지 않습니다. 로컬 노트북/서버에서 실행되므로 그곳에서 모니터링해야 합니다.

도움이 되었기를 바랍니다 :)

Answer