我正在尝试使用python3阅读具有特定标签的网页,但是由于无法处理Unicode字符,它的错误为UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 145: ordinal not in range(256)
,我如何使用正确的语法以获取标签
这是我到目前为止尝试过的MWE
import requests
page = requests.get("https://www.biblegateway.com/passage/?search=Genesis+35&version=NIV")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
story = soup.find_all('p') # to extract story title including <h3> tags
periods = [pt.get_text() for pt in story] # extract only data from <h3> tags
print (periods)