我正试图使用beautifulsoup抓一个网站。我很成功,但有两个问题
从网站上获取数据后,我将它们打印到屏幕上 将它们写入CSV文件。网站上有一个价格字段 从实际金额中得出的卢比符号(价格的样本结构) 字段:₹10000)。当我将数量打印到控制台时,它打印得很好 没问题。当我尝试将其写入excel表时,我收到错误 " Unicodeencoeerror"编解码器' charmap'不能编码字符' \ u20b9'在 位置28.我正在打印其他领域到控制台并且优秀的问题显示 只有两个字段,一个带有货币符号,另一个带有* 符号
我有一个循环运行来从网页获取特定的所有页面 搜索。搜索结果大约344页,但循环停在大约页面 43只有HTML错误500作为错误消息
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
filename = "data.csv"
f = open(filename,"w")
headers = "phone_name, phone_price, phone_rating,number_of_ratings,
memory, display, camera, battery, processor, Warrenty, security, OS\n"
f.write(headers)
for i in range(2): # Number of pages minus one
my_url = 'https://www.flipkart.com/search?as=off&as-
show=on&otracker=start&page=
{}&q=cell+phones&viewType=list'.format(i+1)
print(my_url)
uClient=uReq(my_url)
page_html=uClient.read()
page_soup = Soup(page_html,"html.parser")
containers=page_soup.findAll("a", {"class":"_1UoZlX"})
for container in containers: phone_name =
container.find("div",{"class":"_3wU53n"}).text
try:
phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
except:
phone_price = 'No Data'
非常感谢你的帮助!
答案 0 :(得分:0)
为Excel编写.CSV文件时,utf8
编码应该用于正确支持任何Unicode字符。如果仅使用#!python3
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as Soup
filename = "data.csv"
with open(filename,'w',newline='',encoding='utf-8-sig') as f:
w = csv.writer(f)
headers = 'phone_name phone_price phone_rating number_of_ratings memory display camera battery processor Warrenty security OS'
w.writerow(headers.split())
for i in range(2): # Number of pages minus one
my_url = 'https://www.flipkart.com/search?as=off&as-show=on&otracker=start&page={}&q=cell+phones&viewType=list'.format(i+1)
print(my_url)
uClient=uReq(my_url)
page_html=uClient.read()
page_soup = Soup(page_html,"html.parser")
containers=page_soup.findAll("a", {"class":"_1UoZlX"})
for container in containers:
phone_name = container.find("div",{"class":"_3wU53n"}).text
try:
phone_price = container.find("div",{"class":"_1vC4OE _2rQ-NK"}).text
except:
phone_price = 'No Data'
w.writerow([phone_name,phone_price])
并且显示字符不正确,Excel将假定Windows上的本地化ANSI编码。
phone_name,phone_price,phone_rating,number_of_ratings,memory,display,camera,battery,processor,Warrenty,security,OS
"Asus Zenfone 3 Laser (Gold, 32 GB)","₹9,999"
"Intex Aqua Style III (Champagne/Champ, 16 GB)","₹3,999"
"iVooMi i1s (Platinum Gold, 32 GB)","₹7,499"
"Xolo ERA 3X (Posh Black, 16 GB)","₹6,999"
"iVooMi Me1 (Sunshine Gold, 8 GB)","₹3,599"
"Panasonic Eluga A4 (Mocha Gold, 32 GB)","₹9,790"
Samsung Metro 313 Dual Sim,"₹2,025"
"Samsung Galaxy J3 Pro (Gold, 16 GB)","₹6,990"
Samsung Guru Music 2,"₹1,625"
"Panasonic Eluga A4 (Marine Blue, 32 GB)","₹9,640"
"Asus Zenfone 4 Selfie (Black, 32 GB)","₹9,999"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Asus Zenfone Max (Black, 16 GB)","₹7,486"
Swipe Elite 3- 4G with VoLTE,"₹3,999"
"Swipe Elite Power (Space Grey, 16 GB)","₹5,499"
"Celkon Diamond Mega (Grey, 16 GB)","₹5,499"
"Asus Zenfone Max (Black, 32 GB)","₹7,999"
"Swipe Elite Power (Champagne Gold, 16 GB)","₹5,499"
"Asus Zenfone 4 Selfie (Gold, 32 GB)","₹9,999"
"Karbonn Aura (Champagne, 8 GB)","₹3,199"
"Infinix Note 4 (Ice Blue, 32 GB)","₹8,999"
"Infinix Note 4 (Milan Black, 32 GB)","₹8,999"
"Moto G5s Plus (Blush Gold, 64 GB)","₹15,990"
"Moto G5s Plus (Lunar Grey, 64 GB)","₹15,940"
输出:
{{1}}
Excel中: