我试图从此API
链接中获取数据:
https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200
如果转到上面的链接,您将看到一个奇怪的JSON响应。键和值显示不正确。
我将风管响应转换为一个列表并对其进行迭代。我得到了响应,但键上的值未打印,而是返回None
{'名称':无}
进口沙皮 导入json
class MainSpider(scrapy.Spider):
name = 'main'
# allowed_domains = ['longandfoster.com']
start_urls = ['https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200']
def parse(self, response):
# resp = json.loads(response.body)
resp_list = []
resp = json.loads(response.body)
resp_list.append(resp)
for each in resp_list:
name = each.get('DisplayName')
yield {
"Name": name,
}
答案 0 :(得分:1)
您必须使用json.loads()
两次
resp = json.loads( json.loads(response.body)['Entity'] )
然后您的代码有效。
可以在一个文件中运行python script.py
的最小工作代码,而无需创建项目。
import scrapy
import json
class MainSpider(scrapy.Spider):
name = 'main'
# allowed_domains = ['longandfoster.com']
start_urls = ['https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200']
def parse(self, response):
resp = json.loads(json.loads(response.body)['Entity'])
for each in resp:
name = each.get('DisplayName')
yield {
"Name": name,
}
# --- run without project and save in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
# save in file CSV, JSON or XML
'FEED_FORMAT': 'csv', # csv, json, xml
'FEED_URI': 'output.csv', #
})
c.crawl(MainSpider)
c.start()