Question

假设我正在从结构中抓取数据：

<div id="main">
    <span class="name">$somename</span>
    <span class="email">$someemial</span>        
    <span class="phone">$phone</span>
</div>

我正在使用的scrapy代码类似于：

d.add_xpath('name', '//div[@id="main"]/span[@class="name"]')
d.add_xpath('name', '//div[@id="main"]/span[@class="email"]')
d.add_xpath('name', '//div[@id="main"]/span[@class="phone"]')

我得到的结果是这样分组的：

name1
name2
name3 and so on...

then:
email1
email2
email3 and so on...

and finally:
phone1
phone2
phone3 and so on...

但我想要的是将数据分组如下：

name1
email1
phone1

name2
email2
phone2

name3
email3
phone3

and so on ...

如何用scrapy做到这一点？

提前致谢

Answer 1

我建议使用压缩变量。像这样：

for sel in xpath('//body'):
    name = sel.xpath('//div[@id="main"]/span[@class="name"]')
    email = sel.xpath('//div[@id="main"]/span[@class="email"]')
    phone = sel.xpath('//div[@id="main"]/span[@class="phone"]')
    result = zip(name, email, phone)
    for name, email, phone in result:
        item['name'] = name
        item['email'] = email
        item['phone'] = phone
        yield item

Answer 2

这更像是一个蟒蛇问题。对于这种类型的数据结构，实现此目的的最佳方法是使用词典：

dictExample={}
dictExample['name']=sel.xpath('//div[@id="main"]/span[@class="name"]')
dictExample['email']=sel.xpath('//div[@id="main"]/span[@class="email"]')
dictExample['phone']=sel.xpath('//div[@id="main"]/span[@class="phone"]')

通过执行print dictExample，它将返回以下结果：

{'phone': '872934987', 'name': 'Rafael Alonso', 'email': 'example@example.com'}

现在，如果您想要多个词典，只需将它们附加到列表中：

listExample=[]
for i in range(0,5):
    listExample.append(dictExample)

如何将每个块的数据组合在一起而不是按其xpath分组？

2 个答案: