在JSON

时间:2016-03-05 02:07:35

标签: python json scrapy

当我这样做时:

s = response.xpath('//meta[@id="_bootstrap-neighborhood_card"]').extract()

我得到的是:

<meta content='{"hosting":{"id":2256573,"offset_lat":39.04258923718809,"offset_lng":-95.69083697887662},"map_url":"https://maps.googleapis.com/maps/api/staticmap?markers=%2C&amp;size&amp;zoom=14","place_recommendations":[],"neighborhood_breadcrumb_details":[{"link_text":"Southwest Fillmore Street,","search_text":"Southwest Fillmore Street Topeka, KS","link":"&lt;span&gt;Southwest Fillmore Street,&lt;/span&gt;","link_route":"/s/Southwest-Fillmore-Street-Topeka--KS"},{"link_text":"Topeka,","search_text":"Topeka, KS","link":"&lt;span&gt;Topeka,&lt;/span&gt;","link_route":"/s/Topeka--KS"},{"link_text":"Kansas,","search_text":"Kansas, United States","link":"&lt;span&gt;Kansas,&lt;/span&gt;","link_route":"/s/Kansas--United-States"},{"link_text":"United States","search_text":"United States","link":"&lt;span&gt;United States&lt;/span&gt;","link_route":"/s/United-States"}],"neighborhood_basic_info":null,"neighborhood_localized_name":null,"user_info":{"user_image":"&lt;img alt=\"Elizabeth\" data-pin-nopin=\"true\" height=\"90\" src=\"https://a0.muscache.com/im/users/9199018/profile_pic/1380782460/original.jpg?aki_policy=profile_x_medium\" title=\"Elizabeth\" width=\"90\" /&gt;"}}' id="_bootstrap-neighborhood_card">

这显然是JSON,但它是编码的(如您所见)。我试过了urllib.unquote但是却抛出了一个错误。 AttributeError: 'list' object has no attribute 'split'

我希望不必使用正则表达式来进行URL解码。我能做什么(除了使用正则表达式)来制作这个有效的JSON?

2 个答案:

答案 0 :(得分:2)

获取content属性的值并通过json.loads()

加载
>>> import json
>>> content = response.xpath('//meta[@id="_bootstrap-neighborhood_card"]/@content').extract_first()
>>> json.loads(content)

请注意,您还需要使用extract_first()代替extract()来获取字符串值而不是列表。

答案 1 :(得分:1)

您可以使用json.loads()解码,但是,您需要获取content标记的<meta>属性中包含的JSON字符串。

您可以多次调用xpath()来深入了解所选标记的属性:

meta = response.xpath('//meta[@id="_bootstrap-neighborhood_card"]')
content = meta.xpath('@content').extract_first()
data = json.loads(content)

或者你可以一次性完成:

content = response.xpath('//meta[@id="_bootstrap-neighborhood_card"]').xpath('@content').extract_first()
data = json.loads(content)
from pprint import pprint
pprint(data)

<强>输出

{u'hosting': {u'id': 2256573,
              u'offset_lat': 39.04258923718809,
              u'offset_lng': -95.69083697887662},
 u'map_url': u'https://maps.googleapis.com/maps/api/staticmap?markers=%2C&size&zoom=14',
 u'neighborhood_basic_info': None,
 u'neighborhood_breadcrumb_details': [{u'link': u'Southwest Fillmore Street,',
                                       u'link_route': u'/s/Southwest-Fillmore-Street-Topeka--KS',
                                       u'link_text': u'Southwest Fillmore Street,',
                                       u'search_text': u'Southwest Fillmore Street Topeka, KS'},
                                      {u'link': u'Topeka,',
                                       u'link_route': u'/s/Topeka--KS',
                                       u'link_text': u'Topeka,',
                                       u'search_text': u'Topeka, KS'},
                                      {u'link': u'Kansas,',
                                       u'link_route': u'/s/Kansas--United-States',
                                       u'link_text': u'Kansas,',
                                       u'search_text': u'Kansas, United States'},
                                      {u'link': u'United States',
                                       u'link_route': u'/s/United-States',
                                       u'link_text': u'United States',
                                       u'search_text': u'United States'}],
 u'neighborhood_localized_name': None,
 u'place_recommendations': [],
 u'user_info': {u'user_image': u''}}