我已经按照我的预期创建了一个收集数据的蜘蛛。我现在面临的唯一问题是结果有很多重复。但是,我想在csv:
中写入结果时关闭重复项以下是代码:
import csv
import requests
from lxml import html
def Startpoint():
global writer
outfile=open('Data.csv','w',newline='')
writer=csv.writer(outfile)
writer.writerow(["Name","Price"])
address = "https://www.sephora.ae/en/stores/"
page = requests.get(address)
tree = html.fromstring(page.text)
titles=tree.xpath('//li[contains(@class,"level0")]')
for title in titles:
href = title.xpath('.//a[contains(@class,"level0")]/@href')[0]
Layer2(href)
def Layer2(address):
global writer
page = requests.get(address)
tree = html.fromstring(page.text)
titles=tree.xpath('//li[contains(@class,"amshopby-cat")]')
for title in titles:
href = title.xpath('.//a/@href')[0]
Endpoint(href)
def Endpoint(address):
global writer
page = requests.get(address)
tree = html.fromstring(page.text)
titles=tree.xpath('//div[@class="product-info"]')
for title in titles:
Name = title.xpath('.//div[contains(@class,"h3")]/a[@title]/text()')[0]
Price = title.xpath('.//span[@class="price"]/text()')[0]
metco=(Name,Price)
print(metco)
writer.writerow(metco)
Startpoint()
答案 0 :(得分:1)
您不需要.encode('utf8')
模块来获取csv文件。指定扩展名就足够了。因此,将代码转换为
UnicodeEncodeError
应该做的伎俩。请注意'w'
部分阻止您从'a'
开始写作过程。此外,请注意open
函数中使用的参数{{1}}和{{1}}。虽然第一种方式是"写",第二种方式"追加"。然而,即使从启发式的角度来看,这个代码是有效的,它远远不是“好的”#34;思想。