使用BS4抓取网站数据时出错

时间:2018-06-09 13:11:16

标签: python-2.7 web-scraping beautifulsoup export-to-csv python-unicode

我正在使用这段代码从网站中提取数据。

url = 'https://www.dailythanthi.com/News/Puducherry/2018/06/09020308/Puducherry-budget-will-soon-get-approval--Narayanasamy.vpf'
details = requests.get(url, headers=url_headers )
encoding = details.encoding if 'charset' in details.headers.get('content-type', '').lower() else None
details_data = BeautifulSoup(details.content, "lxml", from_encoding=encoding)
# Remove all script tags
[s.extract() for s in details_data('script')]
details_div = details_data.find("div", {"id": "ArticleDetailContent"})
details_image = details_image_src.img['src']
details_text = []
paras = details_div.find_all("p")
for para in paras:
    details_text.append(para.get_text())

details_text = ''.join(details_text)

使用

将它们写入文件
with open("thanthi.csv", "w") as out_file:
    writer = csv.writer(out_file)
    writer.writerow(["Description"])
    writer.writerows([details_text])

这就是错误说

       writer.writerows(articles)
            UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-

4: ordinal not in range(128)

我不确定这会失败的原因。即使我试图使用.encode(' utf8')方法对它们进行unicode,但仍然会出现详细信息和错误

0 个答案:

没有答案