我正在尝试使用js2xml从页面迭代JSON响应。 我的问题是,如何调用“商店”节点并仅将其作为我的回复? JSON看起来像这样:
<script>
window.appData = {
"ressSize": "large",
"cssPath": "http://css.bbystatic.com/",
"imgPath": "http://images.bbystatic.com/",
"jsPath": "http://js.bbystatic.com/",
"bbyDomain": "http://www.bestbuy.com/",
"bbySslDomain": "https://www-ssl.bestbuy.com/",
"isUserLoggedIn": false,
"zipCode": "46801",
"stores": [{
"id": "2727",
"name": "GLENBROOK SQUARE",
"addr1": "4201 coldwater rd",
"addr2": "spc g10",
"city": "fort wayne",
"state": "IN",
"country": "US",
"zipCode": "46805",
"phone": "260-482-5230"...
<\script>
我的蜘蛛直截了当但我似乎无法想出解析第9个节点“商店”所需要的东西。这是我到目前为止所得到的:
def parse(self, response):
js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
jstree = js2xml.parse(js)
jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')
js2xml.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0])`
对此的回应让我:
<program>
<assign operator="=">
<left>
<dotaccessor>
<object>
<identifier name="window"/>
</object>
<property>
<identifier name="appData"/>
</property>
</dotaccessor>
</left>
<right>
<object>
<property name="ressSize">
<string>large</string>
</property>
<property name="cssPath">
<string>http://css.bbystatic.com/</string>
</property>
<property name="imgPath">
<string>http://images.bbystatic.com/</string>
</property>
<property name="jsPath">
<string>http://js.bbystatic.com/</string>
</property>
<property name="bbyDomain">
<string>http://www.bestbuy.com/</string>
</property>
<property name="bbySslDomain">
<string>https://www-ssl.bestbuy.com/</string>
</property>
<property name="isUserLoggedIn">
<boolean>false</boolean>
</property>
<property name="zipCode">
<string></string>
</property>
<property name="stores">
<array/>
</property>
<property name="preferredStores">
<array/>
</property>
</object>
</right>
</assign>
</program>
{'bbyDomain': 'http://www.bestbuy.com/',
'bbySslDomain': 'https://www-ssl.bestbuy.com/',
'cssPath': 'http://css.bbystatic.com/',
'imgPath': 'http://images.bbystatic.com/',
'isUserLoggedIn': False,
'jsPath': 'http://js.bbystatic.com/',
'preferredStores': [],
'ressSize': 'large',
'stores': [],
'zipCode': ''}
任何想法都会有所帮助!
答案 0 :(得分:2)
让我们使用纽约作为位置,http://www.bestbuy.com/site/store-locator/11356
$ scrapy shell http://www.bestbuy.com/site/store-locator/11356
2016-10-10 16:19:07 [scrapy] INFO: Scrapy 1.2.0 started (bot: scrapybot)
(...)
2016-10-10 16:19:08 [scrapy] DEBUG: Crawled (200) <GET http://www.bestbuy.com/site/store-locator/11356> (referer: None)
>>> js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
>>> js[:100]
u'window.appData = {"ressSize":"large","cssPath":"http://css.bbystatic.com/","imgPath":"http://images.'
>>>
>>> jstree = js2xml.parse(js)
>>> app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]
>>> app_data = js2xml.make_dict(app_data_node)
>>> app_data.keys()
['ressSize', 'isUserLoggedIn', 'preferredStores', 'jsPath', 'bbyDomain', 'bbySslDomain', 'zipCode', 'imgPath', 'cssPath', 'stores']
>>> len(app_data['stores'])
25
所以你在纽约有25家商店。您只需循环app_data["stores"]
。
>>> from pprint import pprint
>>> for store in app_data['stores']:
... pprint(store)
...
{'addPreferredStoreLink': '/site/store-locator/preferred/1115',
'addr1': '13107 40th rd',
'addr2': 'ste c300',
'city': 'flushing',
'country': 'US',
'hours': [{'close': '20:00', 'date': '2016-10-09', 'open': '11:00'},
{'close': '21:00',
'closeTime': '9:00 PM',
'date': '2016-10-10',
'open': '10:00',
'openTime': '10:00 AM'},
{'close': '21:00', 'date': '2016-10-11', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-12', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-13', 'open': '10:00'},
{'close': '22:00', 'date': '2016-10-14', 'open': '10:00'},
{'close': '22:00', 'date': '2016-10-15', 'open': '10:00'},
{'close': '20:00', 'date': '2016-10-16', 'open': '11:00'},
{'close': '21:00', 'date': '2016-10-17', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-18', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-19', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-20', 'open': '10:00'},
{'close': '22:00', 'date': '2016-10-21', 'open': '10:00'},
{'close': '22:00', 'date': '2016-10-22', 'open': '10:00'}],
'hoursDisplay': {'close': '21:00',
'closeTime': '9:00 PM',
'date': '2016-10-10',
'open': '10:00',
'openTime': '10:00 AM'},
'id': '1115',
'isPreferredStore': False,
'latitude': '40.75662',
'locationSubType': 'Big Box Store',
'locationType': 'Store',
'longitude': '-73.83698',
'name': 'FLUSHING NY',
'phone': '718-888-3629',
'removePreferredStoreLink': '/site/store-locator/preferred/1115',
'services': ['Geek Squad Services',
'Best Buy Mobile',
'Best Buy For Business',
'Apple Shop',
'Electronics Recycling',
u'Hablamos Espa\xf1ol',
'Car & GPS Installation Services',
'Samsung Experience Shop',
'Windows Store'],
'state': 'NY',
'zipCode': '11354'}
(...)
{'addPreferredStoreLink': '/site/store-locator/preferred/374',
'addr1': '2478 central park ave',
'city': 'yonkers',
'country': 'US',
'hours': [{'close': '20:00', 'date': '2016-10-09', 'open': '11:00'},
{'close': '21:00',
'closeTime': '9:00 PM',
'date': '2016-10-10',
'open': '10:00',
'openTime': '10:00 AM'},
{'close': '21:00', 'date': '2016-10-11', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-12', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-13', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-14', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-15', 'open': '10:00'},
{'close': '20:00', 'date': '2016-10-16', 'open': '11:00'},
{'close': '21:00', 'date': '2016-10-17', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-18', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-19', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-20', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-21', 'open': '10:00'},
{'close': '21:00', 'date': '2016-10-22', 'open': '10:00'}],
'hoursDisplay': {'close': '21:00',
'closeTime': '9:00 PM',
'date': '2016-10-10',
'open': '10:00',
'openTime': '10:00 AM'},
'id': '374',
'isPreferredStore': False,
'latitude': '40.9814',
'locationSubType': 'Big Box Store',
'locationType': 'Store',
'longitude': '-73.8277',
'name': 'YONKERS NY',
'phone': '914-337-4077',
'removePreferredStoreLink': '/site/store-locator/preferred/374',
'services': ['Windows Store',
'Geek Squad Services',
'Best Buy Mobile',
'Best Buy For Business',
'Apple Shop',
'Electronics Recycling',
u'Hablamos Espa\xf1ol',
'Samsung Experience',
'LG Experience ',
'Sony Experience ',
'Car & GPS Installation Services'],
'state': 'NY',
'zipCode': '10710'}
>>>
在你的Scrapy回调中,你可以这样翻译:
def parse(self, response):
js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
jstree = js2xml.parse(js)
app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]
app_data = js2xml.make_dict(app_data_node)
for store in app_data['stores']:
yield store