Python 3을 사용하여 유니코드 문자가 포함된 웹 콘텐츠를 가져올 수 없습니다.

2024-6-2 • tag-icon

Python 3을 사용하여 유니코드 문자가 포함된 웹 콘텐츠를 가져올 수 없습니다.

python3을 사용하여 특정 태그에 대한 웹페이지를 읽으려고 하는데 유니코드 문자를 처리할 수 없기 때문에 UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 145: ordinal not in range(256)태그를 가져오기 위해 올바른 구문을 사용하는 방법 과 같은 오류가 발생합니다.

지금까지 시도한 MWE는 다음과 같습니다.

import requests


page = requests.get("https://www.biblegateway.com/passage/?search=Genesis+35&version=NIV")

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
story = soup.find_all('p') # to extract story title including <h3> tags
periods = [pt.get_text() for pt in story] # extract only data from <h3> tags
print (periods)

관련 정보