Scrapy在同一页面上抓取多个XPathSelector

时间:2012-11-30 08:23:04

标签: python parsing xpath scrapy

我试图从不同的表中提取数据'在主表内'在同一页面上(相同的URL)。 items字段在所有子表中具有相同的XPath /相同结构,因此我遇到的问题只是添加' Multiple'本页表格部分的XPath

这是我的代码:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem

class MySpider(BaseSpider):
name = "test"
allowed_domains = ["blabla.com"]
start_urls = ["http://www.blablabl..com"] // Start_url Doesnt change = Same Page



def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = [hxs.select('//tr[@class="index class_tr group-6487"]')]

    //Here I would like to have Mltiple XPathSelectors ex:

    // titles = [hxs.select('//tr[@class="index class_tr group-6488"]')]
    // titles = [hxs.select('//tr[@class="index class_tr group-6489"]')]

    // Each for a table section within the same 'Main Table'



    items = []
    for title in titles:
        item = TutorialItem()
        item ['name'] = title.select('td[3]/span/a/text()').extract()
        item ['encryption'] = title.select('td[5]/text()').extract()
        item ['compression'] = title.select('td[8]/text()').extract()
        item ['resolution'] = title.select('td[7]/span/text()').extract()
        items.append(item)
    return items

如果这是可以实现的,我将不胜感激;如果我为每个表格部分写了一个不同的蜘蛛,那么我最终将会有10个蜘蛛用于相同的URL /表格,我不太确定是否可以在相同的csv'中检索数据。按顺序归档。

1 个答案:

答案 0 :(得分:2)

试试这个:

titles = [hxs.select('//tr[@class="index class_tr group-6487"] | //tr[@class="index class_tr group-6488"] | //tr[@class="index class_tr group-6489"]')]