이 페이지를 올바르게 다운로드하는 방법은 무엇입니까?

Question

wgetJavaScript가 URL을 처리하는 방식으로 인해 자체적으로 작동하지 않습니다. 페이지를 구문 분석 xmllint한 다음 URL을 처리 wget할 수 있는 형식으로 처리해야 합니다.

먼저 JavaScript로 처리된 URL을 추출하고 처리하여 다음으로 출력합니다 urls.txt.

wget -O - 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

이제 각 URL을 열어 찾은 PDF 파일을 다운로드하십시오 urls.txt.

wget -O - -i urls.txt | grep -o 'https.*pdf' | wget -i -

curl선택하다:

curl 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

curl -s $(cat urls.txt) | grep -o 'https.*pdf' | xargs -l curl -O

Answer 1

wgetJavaScript가 URL을 처리하는 방식으로 인해 자체적으로 작동하지 않습니다. 페이지를 구문 분석 xmllint한 다음 URL을 처리 wget할 수 있는 형식으로 처리해야 합니다.

먼저 JavaScript로 처리된 URL을 추출하고 처리하여 다음으로 출력합니다 urls.txt.

wget -O - 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

이제 각 URL을 열어 찾은 PDF 파일을 다운로드하십시오 urls.txt.

wget -O - -i urls.txt | grep -o 'https.*pdf' | wget -i -

curl선택하다:

curl 'https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=10685&itemId=1119299160&resourceId=42647' | \
xmllint --html --xpath "//li[@class='resourceColumn']//a/@href" - 2>/dev/null | \
sed -e 's# href.*Books#https://bcs.wiley.com/he-bcs/Books#' -e 's/amp;//g' -e 's/&newwindow.*$//' > urls.txt

curl -s $(cat urls.txt) | grep -o 'https.*pdf' | xargs -l curl -O

이 페이지를 올바르게 다운로드하는 방법은 무엇입니까?

답변1

관련 정보