如何从WEIRD JSON响应中获取价值

时间:2020-07-15 18:00:36

标签: python json scrapy

我试图从此API链接中获取数据: https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200

如果转到上面的链接,您将看到一个奇怪的JSON响应。键和值显示不正确。

我将风管响应转换为一个列表并对其进行迭代。我得到了响应,但键上的值未打印,而是返回None

{'名称':无}

进口沙皮 导入json

class MainSpider(scrapy.Spider):
    name = 'main'
    # allowed_domains = ['longandfoster.com']
    start_urls = ['https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200']

    def parse(self, response):
        # resp = json.loads(response.body)
        resp_list = []
        resp = json.loads(response.body)
        resp_list.append(resp)

        for each in resp_list:
            name = each.get('DisplayName')

            yield {
                "Name": name,
            }

1 个答案:

答案 0 :(得分:1)

您必须使用json.loads()两次

 resp = json.loads( json.loads(response.body)['Entity'] )

然后您的代码有效。


可以在一个文件中运行python script.py的最小工作代码,而无需创建项目。

import scrapy
import json


class MainSpider(scrapy.Spider):
    
    name = 'main'
    # allowed_domains = ['longandfoster.com']
    start_urls = ['https://www.longandfoster.com/include/ajax/api.aspx?op=SearchAgents&firstname=&lastname=&page=1&pagesize=200']

    def parse(self, response):
        resp = json.loads(json.loads(response.body)['Entity'])
        for each in resp:
            name = each.get('DisplayName')

            yield {
                "Name": name,
            }

# --- run without project and save in `output.csv` ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    # save in file CSV, JSON or XML
    'FEED_FORMAT': 'csv',     # csv, json, xml
    'FEED_URI': 'output.csv', #
})
c.crawl(MainSpider)
c.start()