웹사이트를 미러링할 수 없는 이유는 무엇입니까(wget 사용)?

웹사이트를 미러링할 수 없는 이유는 무엇입니까(wget 사용)?

사용해 보았지만 한 페이지만 검색됩니다."wget --mirror http://tshepang.net/tshepang.net/index.html". 이것은 wget의 버그입니까?

이 옵션을 사용한 출력은 다음과 같습니다 --debug.

DEBUG output created by Wget 1.12 on linux-gnu.

Enqueuing http://tshepang.net/ at depth 0
Queue count 1, maxcount 1.
[IRI Enqueuing `http://tshepang.net/' with None
Dequeuing http://tshepang.net/ at depth 0
Queue count 0, maxcount 1.
--2011-01-15 12:32:51--  http://tshepang.net/
Resolving tshepang.net... 66.216.125.32
Caching tshepang.net => 66.216.125.32
Connecting to tshepang.net|66.216.125.32|:80... connected.
Created socket 4.
Releasing 0x089e2be0 (new refcount 1).

---request begin---
GET / HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: tshepang.net

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 Found

Server: nginx/0.7.65

Date: Sat, 15 Jan 2011 10:33:45 GMT

Content-Type: text/html; charset=utf-8

Connection: keep-alive

Status: 302 Found

Location: http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F

X-Runtime: 3

Set-Cookie: cookies_enabled=true; path=/

Cache-Control: no-cache

Content-Length: 141

X-Varnish: 419207385

Age: 0

Via: 1.1 varnish

X-Cache: MISS



---response end---
302 Found

Stored cookie tshepang.net -1 (ANY) / <session> <insecure> [expiry none] cookies_enabled true
Registered socket 4 for persistent reuse.
Location: http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F [following]
Skipping 141 bytes of body: [<html><body>You are being <a href="http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F">redirected</a>.</body></html>] done.
--2011-01-15 12:32:52--  http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
conaddr is: 66.216.125.32
Resolving posterous.com... 184.106.20.99
Caching posterous.com => 184.106.20.99
Releasing 0x089e3e20 (new refcount 1).
Found posterous.com in host_name_addresses_map (0x89e3e20)
Connecting to posterous.com|184.106.20.99|:80... connected.
Created socket 5.
Releasing 0x089e3e20 (new refcount 1).

---request begin---
GET /sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: posterous.com

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 Found

Server: nginx/0.7.65

Date: Sat, 15 Jan 2011 10:33:46 GMT

Content-Type: text/html; charset=utf-8

Connection: close

Status: 302 Found

Location: http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F

X-Runtime: 7

Set-Cookie: _sharebymail_session_id=296a636c8ed3cb6e4e7cabb10256008a; domain=.posterous.com; path=/; HttpOnly

Cache-Control: no-cache

Content-Length: 142

X-Varnish: 2019529137

Age: 0

Via: 1.1 varnish

X-Cache: MISS



---response end---
302 Found
cdm: 1 2
Stored cookie posterous.com -1 (ANY) / <session> <insecure> [expiry none] _sharebymail_session_id 296a636c8ed3cb6e4e7cabb10256008a
Location: http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F [following]
Closed fd 5
--2011-01-15 12:32:53--  http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
Reusing existing connection to tshepang.net:80.
Reusing fd 4.

---request begin---
GET /sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: tshepang.net

Connection: Keep-Alive

Cookie: cookies_enabled=true



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 Found

Server: nginx/0.7.65

Date: Sat, 15 Jan 2011 10:33:46 GMT

Content-Type: text/html; charset=utf-8

Connection: keep-alive

Status: 302 Found

Location: http://tshepang.net/

X-Runtime: 5

Set-Cookie: _sharebymail_session_id=cab0227db8c38f17e572984ee188dc5e; domain=tshepang.net; path=/; HttpOnly

Cache-Control: no-cache

Content-Length: 86

X-Varnish: 419207606

Age: 0

Via: 1.1 varnish

X-Cache: MISS



---response end---
302 Found
cdm: 1 2
Stored cookie tshepang.net -1 (ANY) / <session> <insecure> [expiry none] _sharebymail_session_id cab0227db8c38f17e572984ee188dc5e
Location: http://tshepang.net/ [following]
Skipping 86 bytes of body: [<html><body>You are being <a href="http://tshepang.net/">redirected</a>.</body></html>] done.
--2011-01-15 12:32:54--  http://tshepang.net/
Reusing existing connection to tshepang.net:80.
Reusing fd 4.

---request begin---
GET / HTTP/1.0

User-Agent: Wget/1.12 (linux-gnu)

Accept: */*

Host: tshepang.net

Connection: Keep-Alive

Cookie: _sharebymail_session_id=cab0227db8c38f17e572984ee188dc5e; cookies_enabled=true



---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK

Server: nginx/0.7.65

Date: Sat, 15 Jan 2011 10:33:49 GMT

Content-Type: text/html; charset=utf-8

Connection: keep-alive

Status: 200 OK

ETag: "6ec7aeb4e15e3a80e733f7c2b5e00d6f"

X-Runtime: 1680

Cache-Control: private, max-age=0, must-revalidate

Content-Length: 66513

X-Varnish: 419207692

Age: 0

Via: 1.1 varnish

X-Cache: MISS



---response end---
200 OK
Length: 66513 (65K) [text/html]
Saving to: `tshepang.net/index.html'

     0K .......... .......... .......... .......... .......... 76% 25.7K 1s
    50K .......... ....                                       100% 39.3K=2.3s

2011-01-15 12:32:58 (27.9 KB/s) - `tshepang.net/index.html' saved [66513/66513]

Deciding whether to enqueue "http://tshepang.net/".
Already on the black list.
Decided NOT to load it.
Redirection "http://tshepang.net/" failed the test.
FINISHED --2011-01-15 12:32:58--
Downloaded: 1 files, 65K in 2.3s (27.9 KB/s)

답변1

--no-cookies옵션은 도움이 됩니다(감사합니다그네):

모든 리디렉션으로 인해 wget이 요청을 중단하는 것 같습니다. --no-cookies를 사용해 보세요.

이는 첨부된 로그를 읽어 확인했습니다.

답변2

wget이 경로에 있다고 가정하고(그렇지 않은 경우 전체 경로를 입력해야 함) 다음 명령을 실행합니다.

mkdir wget_files
cd wget_files
wget --mirror –-wait=2 --page-requisites --html-extension –-convert-links –-directory-prefix wget_files/example1 http://www.yourdomain.com

답변3

-r또한 재귀 및 링크 깊이 도 설정해야 합니다 -l X. 여기서 X는 정수입니다. -A보관할 수 있는 파일 형식 목록을 설정하는 것도 좋은 생각입니다(그렇지 않으면 HTML 파일만 받게 됩니다).

관련 정보