我需要抓取一个网站,并在特定的xpath上抓取该网站的每个网址 例如。: 我需要抓取容器中有10个链接的“http://someurl.com/world/”(xpath(“// div [@ class ='pane-content']”)),我需要抓取所有这10个链接并提取图像从他们,但“http://someurl.com/world/”中的链接看起来像 “http://someurl.com/node/xxxx”
我到现在为止:
--------- beginning of system
10-24 17:40:01.419 1309-1675/? I/ActivityManager﹕ START u0 {act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.apsdevelopers.mr.meteout/.mottoscreen (has extras)} from uid 10008 on display 0
10-24 17:40:01.665 1917-1917/? I/Choreographer﹕ Skipped 120 frames! The application may be doing too much work on its main thread.
10-24 17:40:01.950 2026-2037/? I/art﹕ CollectorTransition marksweep + semispace GC freed 1303(40KB) AllocSpace objects, 0(0B) LOS objects, 42% free, 697KB/1209KB, paused 149.640ms total 149.640ms
10-24 17:40:01.961 1917-1917/? I/Choreographer﹕ Skipped 73 frames! The application may be doing too much work on its main thread.
10-24 17:40:02.284 1917-1917/? I/Choreographer﹕ Skipped 32 frames! The application may be doing too much work on its main thread.
10-24 17:40:02.444 1917-1917/? I/Choreographer﹕ Skipped 37 frames! The application may be doing too much work on its main thread.
10-24 17:40:02.595 1917-1917/? I/Choreographer﹕ Skipped 30 frames! The application may be doing too much work on its main thread.
10-24 17:40:02.733 1917-1917/? I/Choreographer﹕ Skipped 34 frames! The application may be doing too much work on its main thread.
10-24 17:40:02.873 1917-1917/? I/Choreographer﹕ Skipped 30 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.152 1917-1917/? I/Choreographer﹕ Skipped 37 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.257 1309-1328/? I/Choreographer﹕ Skipped 411 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.445 1917-1917/? I/Choreographer﹕ Skipped 46 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.537 1309-1328/? I/Choreographer﹕ Skipped 70 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.606 1917-1917/? I/Choreographer﹕ Skipped 39 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.892 1917-1917/? I/Choreographer﹕ Skipped 34 frames! The application may be doing too much work on its main thread.
10-24 17:40:03.888 1309-1328/? I/Choreographer﹕ Skipped 52 frames! The application may be doing too much work on its main thread.
10-24 17:40:04.597 1309-1328/? I/ActivityManager﹕ Displayed com.apsdevelopers.mr.meteout/.mottoscreen: +2s813ms
10-24 17:40:04.814 1309-1328/? I/Choreographer﹕ Skipped 30 frames! The application may be doing too much work on its main thread.
答案 0 :(得分:2)
您可以重写您的规则'满足您的所有要求:
rules = [Rule(LinkExtractor(allow=('/node/.*',), restrict_xpaths=('//div[@class="pane-content"]',)), callback='parse_imgur', follow=True)]
要从提取的图片链接下载图片,您可以使用Scrapy捆绑的ImagePipeline