从用python编写的蜘蛛中获取csv中的重复项

时间:2017-04-19 22:47:50

标签: python csv web-crawler

我已经按照我的预期创建了一个收集数据的蜘蛛。我现在面临的唯一问题是结果有很多重复。但是,我想在csv:

中写入结果时关闭重复项

以下是代码:

import csv
import requests
from lxml import html

def Startpoint():
    global writer
    outfile=open('Data.csv','w',newline='')
    writer=csv.writer(outfile)
    writer.writerow(["Name","Price"])
    address = "https://www.sephora.ae/en/stores/"
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//li[contains(@class,"level0")]')
    for title in titles:
        href = title.xpath('.//a[contains(@class,"level0")]/@href')[0]
        Layer2(href)

def Layer2(address):
    global writer
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//li[contains(@class,"amshopby-cat")]')
    for title in titles:
        href = title.xpath('.//a/@href')[0]
        Endpoint(href)

def Endpoint(address):
    global writer
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//div[@class="product-info"]')
    for title in titles:
        Name = title.xpath('.//div[contains(@class,"h3")]/a[@title]/text()')[0]
        Price = title.xpath('.//span[@class="price"]/text()')[0]
        metco=(Name,Price)
        print(metco)
        writer.writerow(metco)

Startpoint()

1 个答案:

答案 0 :(得分:1)

您不需要.encode('utf8')模块来获取csv文件。指定扩展名就足够了。因此,将代码转换为

UnicodeEncodeError

应该做的伎俩。请注意'w'部分阻止您从'a'开始写作过程。此外,请注意open函数中使用的参数{{1}}和{{1}}。虽然第一种方式是"写",第二种方式"追加"。然而,即使从启发式的角度来看,这个代码是有效的,它远远不是“好的”#34;思想。