여러 줄 목록에서 wget을 파일 이름으로 변환하는 방법은 무엇입니까?

Question 1

내가 하고 있는 일이 모범 사례가 아닌 경우 대안을 모색할 수 있습니다.

사용하지 않거나 기다리지 않는 것이 bash좋습니다 sed! 그리고 Python 방식을 사용하면 이는 구문 분석해야 하는 XML을 구문 분석하는 더 좋은 방법입니다. 방금 python3.6으로 작성하고 테스트했는데 정확히 원하는 대로 작동합니다.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
    os.mkdir('Author{}{}'.format(str(ln.text), str(fn.text)))
    os.chdir('Author{}{}'.format(ln.text, fn.text))
    filename = wget.download(link.text)
    os.rename(filename, 'File.zip')
    os.chdir('../')

파일에 저장하거나 python3 인터프리터 cli에 붙여넣거나 입력할 수 있습니다. 선택은 귀하에게 달려 있습니다.

당신은 설치해야합니다python3-wget그리고아름다운 수프 4pip 또는 easy_install 등을 사용하십시오.

Answer

내가 하고 있는 일이 모범 사례가 아닌 경우 대안을 모색할 수 있습니다.

사용하지 않거나 기다리지 않는 것이 bash좋습니다 sed! 그리고 Python 방식을 사용하면 이는 구문 분석해야 하는 XML을 구문 분석하는 더 좋은 방법입니다. 방금 python3.6으로 작성하고 테스트했는데 정확히 원하는 대로 작동합니다.

#!/usr/bin/python3
# Let's import the modules we need
import wget
import os
import requests
from bs4 import BeautifulSoup as bs

# Assign the url to a variable (not essential as we 
# only use it once, but it's pythonic)
url = 'https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B'

# Use requests to fetch the raw xml
r = requests.get(url)

# Use BeautifulSoup and lxml to parse the raw xml so 
# we can do stuff with it
s = bs(r.text, 'lxml')

# We need to find the data we need. This will find it and create some 
# python lists for us to loop through later

# Find all xml tags named 'url_zip_file' and assign them to variable
links = s.find_all('url_zip_file')

# Find all xml tags named 'last_name' and assign them to variable
last_names = s.find_all('last_name')

# Find all xml tags named 'last_name' and assign them to variable
first_names = s.find_all('first_name')

# Find all xml tags named 'language' and assign them to variable
language = s.find_all('language')

# Assign the language to a variable
english = language[0].text

# Make our new language directory
os.mkdir(english)

# cd into our new language directory
os.chdir(str(english))

# Loop through the last names (ln), first names(fn) and links
# so we can make the directories, download the file, rename the 
# file then we go back a directory and loop again
for ln, fn, link in zip(last_names, first_names, links):
    os.mkdir('Author{}{}'.format(str(ln.text), str(fn.text)))
    os.chdir('Author{}{}'.format(ln.text, fn.text))
    filename = wget.download(link.text)
    os.rename(filename, 'File.zip')
    os.chdir('../')

파일에 저장하거나 python3 인터프리터 cli에 붙여넣거나 입력할 수 있습니다. 선택은 귀하에게 달려 있습니다.

당신은 설치해야합니다python3-wget그리고아름다운 수프 4pip 또는 easy_install 등을 사용하십시오.

Question 2

Librivox API는 사용할 수 있는 경우 JSON 출력도 제공하며, 적절한 XML 도구를 사용하여 jqXML을 구문 분석하는 것보다 JSON을 구문 분석하는 것이 더 쉽습니다 .jq

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
  jq -r '.books[] | "\(.language).\(.authors[0].last_name + .authors[0].first_name).\(.title).zip", .url_zip_file'

다음과 같은 출력을 제공합니다.

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonoré de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

그 후에는 사용하기가 더 쉬워집니다 xargs.

curl "$u" -sL |
  jq -r '.books[] | "\(.language).\(.authors[0].last_name + .authors[0].first_name).\(.title).zip", .url_zip_file' |
  xargs -d '\n' -n2 wget -O

xargs두 줄이 매개변수로 사용 되는데 wget첫 번째 줄은 -O옵션 매개변수, 두 번째 줄은 URL이 됩니다.

추천하고 싶지만Jamie와 같은 Python 기반 솔루션단, bs4 대신 JSON 및 Python의 내장 JSON 기능을 사용하는 것은 제외됩니다.

Answer

Librivox API는 사용할 수 있는 경우 JSON 출력도 제공하며, 적절한 XML 도구를 사용하여 jqXML을 구문 분석하는 것보다 JSON을 구문 분석하는 것이 더 쉽습니다 .jq

u='https://librivox.org/api/feed/audiobooks/?offset=0&limit=3&fields=%7Blanguage,authors,title,url_zip_file%7B&format=json'
curl "$u" -sL |
  jq -r '.books[] | "\(.language).\(.authors[0].last_name + .authors[0].first_name).\(.title).zip", .url_zip_file'

다음과 같은 출력을 제공합니다.

English.DumasAlexandre.Count of Monte Cristo.zip
http://www.archive.org/download/count_monte_cristo_0711_librivox/count_monte_cristo_0711_librivox_64kb_mp3.zip
English.BalzacHonoré de.Letters of Two Brides.zip
http://www.archive.org/download/letters_brides_0709_librivox/letters_brides_0709_librivox_64kb_mp3.zip
English.DickensCharles.Bleak House.zip
http://www.archive.org/download/bleak_house_cl_librivox/bleak_house_cl_librivox_64kb_mp3.zip

그 후에는 사용하기가 더 쉬워집니다 xargs.

curl "$u" -sL |
  jq -r '.books[] | "\(.language).\(.authors[0].last_name + .authors[0].first_name).\(.title).zip", .url_zip_file' |
  xargs -d '\n' -n2 wget -O

xargs두 줄이 매개변수로 사용 되는데 wget첫 번째 줄은 -O옵션 매개변수, 두 번째 줄은 URL이 됩니다.

추천하고 싶지만Jamie와 같은 Python 기반 솔루션단, bs4 대신 JSON 및 Python의 내장 JSON 기능을 사용하는 것은 제외됩니다.

Question 3

무차별적인 힘.

구문 분석된 XML이 다음 위치에 있는 경우books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

행을 변수로 다시 그룹화하고 레코드 블록을 5개 행으로 채우면 됩니다.

Answer

무차별적인 힘.

구문 분석된 XML이 다음 위치에 있는 경우books

while read a; read b; read c; read d; read e; do wget $c -O $b$e$d$a; echo $c; done < books

행을 변수로 다시 그룹화하고 레코드 블록을 5개 행으로 채우면 됩니다.

여러 줄 목록에서 wget을 파일 이름으로 변환하는 방법은 무엇입니까?

답변1

답변2

답변3

관련 정보