我有一个问题,暂时无法弄清楚
因为网站结构,我抓到json文件的数据如下:
[{"location": ["(\u5357\u6295)", "(\u53f0\u5357)", "(\u53f0\u5357)"],
"leisuretitle": ["2014", "20140721", "20140726"]}]
但我想要的格式是:
{"leisurelocation": ["(\u5357\u6295)"], "leisuretitle": ["2014"]},
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140721"]},
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140726"]}]
这是我的代码:
我不知道该怎么做。有人可以指导我一下吗?
def parse(self, response):
sel = Selector(response)
sites = sel.css("div#listabc table ")
for site in sites:
item = LeisureItem()
leisurelocation = site.css(" tr > td.subject > span.city::text ").extract()
leisuretitle = site.css(" tr > td.subject a::text ").extract()
item['leisurelocation'] = leisurelocation
item['leisuretitle'] = leisuretitle
yield item
答案 0 :(得分:1)
您想要的是从leisurelocation
和leisuretitle
生成多个项目:
leisurelocation = ...
leisuretitle = ...
for i,j in zip(leisurelocation, leisuretitle):
yield LeisureItem(leisurelocation=[i], leisuretitle=[j])
答案 1 :(得分:0)
kev的答案对于您定义的问题是正确的,但我不认为这是正确的方法。你应该逐个刮掉这些物品。
例如,逐行循环遍历表格,并将每个抓取的行作为项目生成:
def parse(self, response):
for city in response.css("div#listabc table>tr"):
item = LeisureItem()
item['leisurelocation'] = city.css("td.subject>span.city::text").extract()
item['leisuretitle'] = city.css("td.subject a::text").extract()
yield item