我现在正在编写一个网络抓取工具,我的Python很生疏,所以我只是想知道是否有更短的语法来完成以下操作......
def parse(self, response):
prc_path = '//span[@class="result-meta"]/span[@class="result-price"]/text()'
sqf_path = '//span[@class="result-meta"]/span[@class="housing"]/text()'
loc_path = '//span[@class="result-meta"]/span[@class="result-hood"]/text()'
prc_resp = response.xpath(prc_path).extract_first()
sqf_resp = response.xpath(sqf_path).extract_first()
loc_resp = response.xpath(loc_path).extract_first()
if sqf_resp and loc_resp:
yield {
'prc': response.xpath(prc_path).extract_first(),
'sqf': response.xpath(sqf_path).extract_first(),
'loc': response.xpath(loc_path).extract_first()
}
elif sqf_resp:
yield {
'prc': response.xpath(prc_path).extract_first(),
'sqf': response.xpath(sqf_path).extract_first()
}
else:
yield {
'prc': response.xpath(prc_path).extract_first(),
'loc': response.xpath(loc_path).extract_first()
}
正如你所看到的,有很多重复,我希望尽可能保持干燥。
答案 0 :(得分:1)
您可以创建字典,然后为其添加适当的条目。
result = { 'prc': response.xpath(prc_path).extract_first() }
if sqf_path:
result['sqf'] = response.xpath(sqf_path).extract_first()
if loc_path:
result['loc'] = response.xpath(loc_path).extract_first()
yield result
你也可以用dict理解来分解extract_path
位。
result = { 'prc': prc_path, 'sqf': sqf_path, 'loc': loc_path }
yield { key : response.xpath(value).extract_first()
for (key, value) in result.items() if value }
在早期版本的Python中,这将是:
result = { 'prc': prc_path, 'sqf': sqf_path, 'loc': loc_path }
yield dict((key, response.xpath(value).extract_first())
for (key, value) in result.items() if value)
答案 1 :(得分:1)
我选择查找地图:
select date_part(epoch, date) as date
from t2
where t2.date >= '2014-01-01'
order by date desc