我正在使用这段代码从网站中提取数据。
url = 'https://www.dailythanthi.com/News/Puducherry/2018/06/09020308/Puducherry-budget-will-soon-get-approval--Narayanasamy.vpf'
details = requests.get(url, headers=url_headers )
encoding = details.encoding if 'charset' in details.headers.get('content-type', '').lower() else None
details_data = BeautifulSoup(details.content, "lxml", from_encoding=encoding)
# Remove all script tags
[s.extract() for s in details_data('script')]
details_div = details_data.find("div", {"id": "ArticleDetailContent"})
details_image = details_image_src.img['src']
details_text = []
paras = details_div.find_all("p")
for para in paras:
details_text.append(para.get_text())
details_text = ''.join(details_text)
使用
将它们写入文件with open("thanthi.csv", "w") as out_file:
writer = csv.writer(out_file)
writer.writerow(["Description"])
writer.writerows([details_text])
writer.writerows(articles)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-
4: ordinal not in range(128)
我不确定这会失败的原因。即使我试图使用.encode(' utf8')方法对它们进行unicode,但仍然会出现详细信息和错误