我指的是stackoverflow上列出的以下问题: Scrapy, scrapping data inside a javascript
我试图复制@Rho给出的这个问题的答案,以学习如何从javascript生成的表单中抓取数据。自问题发布以来,表单的有效负载似乎已经发生了变化,因此我进行了相应的修改。
我的代码和输出如下:
>>>scrapy shell https://www.mcdonalds.com.sg/locate-us/
2015-07-07 12:09:28+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
.....
2015-07-07 12:09:28+0800 [default] INFO: Spider opened
2015-07-07 12:09:32+0800 [default] DEBUG: Crawled (200) <GET https://www.mcdonalds.com.sg/locate-us/> (referer: None)
....
>>> url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
>>> payload = {'action':'store_locator_locations'}
>>> head = {'X-Requested-With':'XMLHttpRequest'}
>>> from scrapy.http import FormRequest
>>> req=FormRequest(url,formdata=payload,headers=head)
>>> fetch(req)
2015-07-07 12:12:24+0800 [default] DEBUG: Crawled (404) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)
预期回复为200
,但正如您在上面看到的,我收到404
错误代码。
答案 0 :(得分:0)
这不是代码本身的问题。您提到的原始问题和答案来自 2013 ;一生以前在互联网上。
麦当劳新加坡的情况发生了变化,而对于Wordpress来说似乎也是如此。但不是那么多。
过去是什么
url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
现在是
url = 'https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php'
(我通过使用Chrome F12开发人员工具并查看“网络”标签找到了这一点)
事实上,您可以向此网址发出GET
请求并获取JSON:
GET
https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php?action=store_locator_locations
[{
"id": "417",
"name": "McDonald\u2019s JCube",
"address": "2 Jurong East Central 1<br\/>#01-09<br\/>JCube\r\n",
"city": "Singapore",
"lat": "1.33352",
"long": "103.740277",
"op_hours": "Mon-Fri: Opens at 0630<br>\r\nSat-Sun: Opens at 0700<br>\r\nSun-Thur: Closes at 2300 <br>\r\nFri\/Sat & PH Eve: Closes at 0000\r\n<br><br>\r\nDessert Kiosk: Daily 1100 - 2300",
"phone": "66844228",
"region": "west",
"types": ["3"],
"zip": "609731"
},
...
]