Question

我无法弄清楚我在代码中所犯的任何错误。 Xpath没有的问题。如果单独检查，功能正常。我跑的时候蜘蛛它得到以下错误。基本上，它到达时就会发生要处理的Layer2函数。当我检查了第一次生产链接我可以注意到那些是完整的网址。我该怎么做在这一刻得到结果。提前谢谢。

蜘蛛：

import requests
from lxml import html

def Startpoint():
    address = "https://www.sephora.ae/en/stores/"
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//li[contains(@class,"level0")]')
    for title in titles:
        href = title.xpath('.//a[contains(@class,"level0")]/@href')[0]
        Layer2(href)

def Layer2(address):
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//div[@class="product-manufacturer"]')
    for title in titles:
        href = title.xpath('.//a/@href')[0]
        Endpoint(href)

def Endpoint(address):
    page = requests.get(address)
    tree = html.fromstring(page.text)
    titles=tree.xpath('//div[@class="add-to-cart"]')
    for title in titles:
        Name = title.xpath('.//div[@class="h2"]/text()')[0]
        Price = title.xpath('.//span[@class="price"]/text()')[0]
        print('{}{}'.format(Name, Price))      

Startpoint()

根据Max Paymar的建议修改上述代码。现在它正在发挥作用。

Answer 1

我从来没有使用过这个库，所以我可能错了，但看起来url变量需要修改，以便它是一个字符串。错误信息中的括号'['看起来确实不合适。

我的网络抓取工具抛出错误而不是提取数据

蜘蛛：

1 个答案: