Question

我已经按照我的预期创建了一个收集数据的蜘蛛。我现在面临的唯一问题是结果有很多重复。但是，我想在csv：

中写入结果时关闭重复项

以下是代码：

import csv
import requests
from lxml import html

def Startpoint():
    global writer
    outfile=open('Data.csv','w',newline='')
    writer=csv.writer(outfile)
    writer.writerow(["Name","Price"])
    address = "https://www.sephora.ae/en/stores/"
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//li[contains(@class,"level0")]')
    for title in titles:
        href = title.xpath('.//a[contains(@class,"level0")]/@href')[0]
        Layer2(href)

def Layer2(address):
    global writer
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//li[contains(@class,"amshopby-cat")]')
    for title in titles:
        href = title.xpath('.//a/@href')[0]
        Endpoint(href)

def Endpoint(address):
    global writer
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//div[@class="product-info"]')
    for title in titles:
        Name = title.xpath('.//div[contains(@class,"h3")]/a[@title]/text()')[0]
        Price = title.xpath('.//span[@class="price"]/text()')[0]
        metco=(Name,Price)
        print(metco)
        writer.writerow(metco)

Startpoint()

Answer 1

您不需要.encode('utf8')模块来获取csv文件。指定扩展名就足够了。因此，将代码转换为

UnicodeEncodeError

应该做的伎俩。请注意'w'部分阻止您从'a'开始写作过程。此外，请注意open函数中使用的参数{{1}}和{{1}}。虽然第一种方式是＆＃34;写＆＃34;，第二种方式＆＃34;追加＆＃34;。然而，即使从启发式的角度来看，这个代码是有效的，它远远不是“好的”＃34;思想。

从用python编写的蜘蛛中获取csv中的重复项

1 个答案: