我正在尝试scrapy并遇到一些问题。问题是我的脚本返回重复的结果。我试图从父页面抓取网址并按照每个网址获取相关日期。在抓取每个嵌套的url后,它似乎会再次从父页面输出url列表。
这是脚本:
import scrapy from aeon.items import AeonItem from scrapy.http.request import Request class AeonSpider(scrapy.Spider): name = "aeon" allowed_domains = ["aeon.co"] start_urls = [ "http://aeon.co/magazine/technology" ] def parse(self, response): items = [] for sel in response.xpath('//*[@id="latestPosts"]'): item = AeonItem() item['primary_url'] = sel.xpath('div/div/div/a/@href').extract() for each in item['primary_url']: yield Request(each, callback=self.parse_next_page,meta={'item':item}) def parse_next_page(self, response): for sel in response.xpath('//*[@id="top"]'): item = response.meta['item'] item['comments'] = sel.xpath('div[5]/div[3]/div[2]/div/p/em/span[@class="instapaper_datepublished"]/text()').extract() return item
这是json输出:
{"comments": ["13 February 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["31 January 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["12 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["31 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["30 May 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
重申一下,我无法从父页面输出一个网址列表,也无法从每个嵌套网址中输出一个相应日期列表。我是scrapy和python的新手,所以希望有人可以指出我正确的方向。
答案 0 :(得分:0)
你的代码正在迭代错误的东西。
response.xpath('//*[@id="latestPosts"]')
位返回一个列表,其中只包含一个包含所有文章链接的选择器。
尝试将循环更改为:
for sel in response.xpath('//*[@id="latestPosts"]/div/div/div'):
item = AeonItem()
item['primary_url'] = sel.xpath('./a/@href').extract()
...
您可能希望在其他回调中应用相同的更改 - 我将为您留下余下的乐趣。 =)