我正在处理一个以div作为值的json请求。 现在我想只获取data-content-value
的值<li id="term_100800962" data-content-value='{"nl_term_id":100800962,"c_price_from":33415,"nd_price_discount":0,"nl_tour_id":1017864,"nl_hotel_id":[49316],"d_start":"2017-04-12","d_end":"2017-04-17"}' >
并将其存储在&#39;日期&#39; &#39; ID&#39; &#39;价格&#39;而且我无法找到一种方法来做到这一点。
有简单的方法吗?
答案 0 :(得分:3)
label.Layer.Sublayers[0].RemoveFromSuperLayer();
首先,获取属性的字符串,然后使用In [2]: from scrapy.selector import Selector
In [3]: text = """<li id="term_100800962" data-content-value='{"nl_term_id":100
...: 800962,"c_price_from":33415,"nd_price_discount":0,"nl_tour_id":1017864,"
...: nl_hotel_id":[49316],"d_start":"2017-04-12","d_end":"2017-04-17"}' >"""
In [4]: sel = Selector(text=text)
In [5]: data_string = sel.xpath('//li/@data-content-value').extract_first()
In [6]: import json
In [7]: json.loads(data_string)
Out[7]:
{'c_price_from': 33415,
'd_end': '2017-04-17',
'd_start': '2017-04-12',
'nd_price_discount': 0,
'nl_hotel_id': [49316],
'nl_term_id': 100800962,
'nl_tour_id': 1017864}
将其转换为python dict。
这个url会返回一个json响应,我们应该加载所有对json的响应并选择我们需要的信息:
json.loads()
出:
In [11]: fetch('https://dovolena.invia.cz/direct/tour_search/ajax-next-boxes/?nl
...: _country_id%5B0%5D=28&nl_locality_id%5B0%5D=19&d_start_from=23.01.2017&
...: d_end_to=19.04.2017&nl_transportation_id%5B0%5D=3&sort=nl_sell&page=1&g
...: etOptionsCount=true&base_url=https%3A%2F%2Fdovolena.invia.cz%2F')
In [12]: j = json.loads(response.text)
In [15]: j['boxes_html'] # this will renturn the html in json file.
In [15]: from scrapy.selector import Selector
In [16]: sel = Selector(text=j['boxes_html']) # loads html to selector
In [17]: datas = sel.xpath('//li/@data-content-value').extract() # return all data in a list
In [21]: [json.loads(d) for d in datas] # loads text to value
|---dict-----|
# this will return a list of dict which generated by json.loads(d), and you can use json.loads(d)['d_end'] to access it's element.