我正在学习Scrapy和Python,并且遇到了这个问题。
在抓取此网站时:http://www.laughfactory.com/jokes/family-jokes,该代码可以正常工作。
class JokesSpider(scrapy.Spider):
name = 'jokes'
allowed_domains = ['www.laughfactory.com']
start_urls = ["http://www.laughfactory.com/jokes/family-jokes"]
def parse(self, response):
for joke in response.xpath("//div[@class='jokes']"):
yield {
'joke_text': joke.xpath(".//div[@class='joke-text']").extract_first()
}
在其他网站上使用类似代码时:https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077代码:
class eKupiSingleCategoryXPath(scrapy.Spider):
name = "monitor_xpath"
allowed_domains = ["https://www.ekupi.hr/hr/"]
start_urls = ["https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077"]
def parse(self, response):
for monitorSelectXPath in response.xpath("//div[@class='details']"):
sleep(1)
yield {
"name": monitorSelectXPath.xpath("//a[@class='name']/text()").extract_first()
}
我相信我使用的是正确的选择器,并且我相信代码可以与CSS选择器一起使用。 xpath选择器的输出始终相同。
以下输出:
2020-05-07 23:04:17 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:23 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:33 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.engine] INFO: Closing spider (finished)
答案 0 :(得分:0)
在xpath表达式中删除//。如下更新yield语句。
yield {
"name": monitorSelectXPath.xpath("a[@class='name']/text()").extract_first()
}
还可以通过scrapy shell测试选择器。终端命令如下:
scrapy shell https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077