Question

我有一个非常简单的网络蜘蛛可以抓取一些football team data。到目前为止，我只对页面顶部的一些元数据感兴趣。我尝试使用scrapy ItemLoader从页面中检索数据。但它不起作用。只有我能得到第一个领域。我错过了什么？

ft = FootballTeamItem()
    sel = Selector(text = response.text)
    headerLoader = ItemLoader(ft,  selector= sel)
    # headerLoader.add_xpath('name',  '//h2[@class="team-logo"]/strong/text()')
    headerLoader.add_xpath('league','//div[contains(@class, "intro-con team-con")]/table/tbody/tr[1]/td[1]/a/text()')
    headerLoader.add_xpath('coach', '//div[contains(@class, "intro-con team-con")]/table/tbody/tr[1]/td[2]/text()')
    headerLoader.add_xpath('city',  '//div[contains(@class, "intro-con team-con")]/table/tbody/tr[1]/td[3]/text()')
    headerLoader.add_xpath('start', '//div[contains(@class, "intro-con team-con")]/table/tbody/tr[2]/td[1]/text()')
    headerLoader.add_xpath('court', '//div[contains(@class, "intro-con team-con")]/table/tbody/tr[2]/td[2]/text()')
    headerLoader.load_item()

我尝试了不同的方法来构造ItemLoader，使用选择器实例直接或类似地使用响应。但它仍然没有奏效。有趣的是，代码段运行良好，没有scrapy。但它在scrapy项目中运行时总是失败。

Answer 1

您需要在xpath的开头处设置一段时间来将其视为相对于您而言的选择＆＃39; sel＆＃39;元件。

为什么我的scrapy ItemLoader失败了？

1 个答案: