Python-Scrapy代码可在一个网站上运行,而在选择器调整后无法在其他网站上运行

时间:2020-05-07 21:08:31

标签: python xpath web-scraping scrapy

我正在学习Scrapy和Python,并且遇到了这个问题。

在抓取此网站时:http://www.laughfactory.com/jokes/family-jokes,该代码可以正常工作。

class JokesSpider(scrapy.Spider):
name = 'jokes'
allowed_domains = ['www.laughfactory.com']
start_urls = ["http://www.laughfactory.com/jokes/family-jokes"]

def parse(self, response):
    for joke in response.xpath("//div[@class='jokes']"):

        yield {
            'joke_text': joke.xpath(".//div[@class='joke-text']").extract_first()
        }

在其他网站上使用类似代码时:https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077代码:

class eKupiSingleCategoryXPath(scrapy.Spider):
name = "monitor_xpath"
allowed_domains = ["https://www.ekupi.hr/hr/"]
start_urls = ["https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077"]

def parse(self, response):
    for monitorSelectXPath in response.xpath("//div[@class='details']"):
        sleep(1)

        yield {
            "name": monitorSelectXPath.xpath("//a[@class='name']/text()").extract_first()
        }

我相信我使用的是正确的选择器,并且我相信代码可以与CSS选择器一起使用。 xpath选择器的输出始终相同。

以下输出:

2020-05-07 23:04:17 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:23 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:33 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.engine] INFO: Closing spider (finished)

1 个答案:

答案 0 :(得分:0)

在xpath表达式中删除//。如下更新yield语句。

yield {
            "name": monitorSelectXPath.xpath("a[@class='name']/text()").extract_first()
        }

还可以通过scrapy shell测试选择器。终端命令如下:

scrapy shell https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077