所以我试图从本地网站上删除有关主板的数据。
import bs4
import os
import requests
from bs4 import BeautifulSoup as soup
os.chdir('E://')
os.makedirs('E://scrappy', exist_ok=True)
myurl = "https://www.example.com"
res = requests.get(myurl)
page = soup(res.content, 'html.parser')
containers = page.findAll("div", {"class": "content-product"})
filename = 'AM4.csv'
f = open(filename, 'w')
headers = 'Motherboard_Name, Price\n'
f.write(headers)
for container in containers:
Product = container.findAll("div", {"class": "product-title"})
Motherboard_Name = Product[0].text.strip()
Kimat = container.findAll("span", {"class": "price"})
Price = Kimat[0].text
print('Motherboard_Name' + Motherboard_Name)
print('Price' + Price)
f.write(Motherboard_Name + "," + Price.replace(",", "") + "\n")
f.close() print("done")
但是当我运行此代码时出现错误
UnicodeEncodeError:'charmap'编解码器无法对位置45中的字符'\ u20b9'进行编码:字符映射到
我该如何解决这个问题?
编辑::所以我通过添加encoding =“utf-8”修复了unicode错误(正如这里提到的那样python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>)(open(filename,'w',encoding =“utf-8”) )它似乎做了工作但是在csv文件中得到价格之前的字符(â,¹)...我该如何解决这个问题?
答案 0 :(得分:0)
使用csv
模块管理CSV文件,并使用utf-8-sig
for Excel正确识别UTF-8。打开文件时,请确保按照newline=''
文档使用csv
。
示例:
import csv
filename = 'AM4.csv'
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
w = csv.writer(f)
w.writerow(['Motherboard_Name','Price'])
name = 'some name'
price = '\u20b95,99'
w.writerow([name,price.replace(',','')])