scrapy /有多个链接的页面

时间:2014-06-11 05:12:28

标签: xml parsing hyperlink scrapy

如何使用基本蜘蛛在同一页面中抓取多个配置文件链接。谁能帮我。我是scrapy的新手。

请帮助改进我的代码

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    rest_page = hxs.select("//h3/a/@href").extract() # this is the link in start url page
    if not not rest_page:
        yield Request(rest_page[0], callback = self.parse)

    titles = hxs.select("//p") # this is the profile page 9 no nextpage links here. need spider to  return to the next link in the landing page when all items are scarped.please help
    items = []

    for titles in titles:
        items ['name']

    items.append(item)
    for item in items:
        yield item

0 个答案:

没有答案