Question

使用JSON响应时出现错误：

Error: AttributeError: 'str' object has no attribute 'get'

可能是什么问题？

对于其他值，我也遇到以下错误：

*** TypeError：“ builtin_function_or_method”对象不可下标

“电话”：值['_source'] ['primaryPhone']， KeyError：“ primaryPhone” ***

# -*- coding: utf-8 -*-
import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'main'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

def parse(self, response):

    resp = json.loads(response.body)
    values = resp['hits']['hits']

    for value in values:

        yield {
            'Full Name': value['_source']['fullName'],
            'Phone': value['_source']['primaryPhone'],
            "Email": value['_source']['primaryEmail'],
            "City": value.get['_source']['city'],
            "Zip Code": value.get['_source']['zipcode'],
            "Website": value['_source']['websiteURL'],
            "Facebook": value['_source']['facebookURL'],
            "LinkedIn": value['_source']['LinkedIn_URL'],
            "Twitter": value['_source']['Twitter'],
            "BIO": value['_source']['Bio']
        }

Answer 1

它的嵌套深度超出了您的想象。这就是为什么您会遇到错误。

代码示例

import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

    def parse(self, response):
        resp = json.loads(response.body)
        values = resp['hits']['hits']

        for value in values:
            yield {
                'Full Name': value['_source']['fullName'],
                'Primary Phone':value['_source']['primaryPhone']
            }

解释

resp变量正在创建python字典，但是此JSON数据中没有resp['hits']['hits']['fullName']。您要查找的用于fullName的数据实际上是resp['hits']['hits'][i]['_source']['fullName']。 i是一个数字，因为resp['hits']['hits']是一个列表。

resp['hits']是一个字典，因此values变量很好。但是resp['hits']['hits']是一个列表，因此您不能使用get请求，它仅接受数字作为[]中的值，而不接受字符串。因此是错误。

提示

使用response.json（）而不是json.loads（response.body），因为Scrapy v2.2开始，scrapy现在内部支持json。它已经在后台导入了json。
还要检查json数据，为了方便起见，我使用了请求，只是向下嵌套，直到获得所需的数据为止。
为这种类型的数据构建结构良好的字典是合适的，但是任何其他需要修改或更改或在某些地方有误的数据。使用Items字典或ItemLoader。这两种产生输出的方式比产生字典具有更大的灵活性。我几乎从不制作字典，只有当您拥有高度结构化的数据时。

更新代码

查看JSON数据，有很多丢失的数据。这是网络抓取的一部分，您会发现类似的错误。在这里，我们使用try andexcept块，因为当我们得到KeyError时，这意味着python无法识别与值关联的键。我们必须处理该异常，我们在这里通过说要产生一个字符串'No XXX'

一旦开始出现空白等，最好考虑使用Items字典或Itemloaders。

现在值得查看有关Items的Scrapy文档。本质上，Scrapy做两件事，它从网站中提取数据，并提供了一种存储数据的机制。它的实现方式是将其存储在名为Items的字典中。该代码与生成字典没有太大区别，但是Items字典使您可以通过scrapy可以做的其他事情更轻松地操纵提取的数据。您需要先使用所需的字段来编辑items.py。我们创建一个名为TestItem的类，我们使用scrapy.Field（）定义每个字段。然后，我们可以在蜘蛛脚本中导入此类。

items.py

import scrapy


class TestItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    full_name = scrapy.Field()
    Phone = scrapy.Field()
    Email = scrapy.Field()
    City = scrapy.Field()
    Zip_code = scrapy.Field()
    Website = scrapy.Field()
    Facebook = scrapy.Field()
    Linkedin = scrapy.Field()
    Twitter = scrapy.Field()
    Bio = scrapy.Field()

在这里，我们指定了我们想要的字段，不幸的是，您不能使用带空格的字符串，因此为什么全名是full_name。 field（）为我们创建商品字典的字段。

我们使用from ..items import TestItem将此项目字典导入蜘蛛脚本。 from ..items意味着我们将从父文件夹中获取items.py到Spider脚本，并且正在导入TestItem类。这样我们的蜘蛛就可以用我们的json数据填充商品字典。

请注意，就在for循环之前，我们通过item = TestItem（）实例化了TestItem类。实例化是指调用该类，在这种情况下，它会创建一个字典。这意味着我们要创建项目字典，然后用键和值填充该字典。您必须先执行此操作，然后才能在for循环中添加键和值。

蜘蛛脚本

import scrapy
import json
from ..items import TestItem

class MainSpider(scrapy.Spider):
   name = 'test'
   start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

   def parse(self, response):
       resp = json.loads(response.body)
       values = response.json()['hits']['hits']
       item = TestItem()
       for value in values:
        try:
            item['full_name'] = value['_source']['fullName']
        except KeyError:
            item['full_name'] = 'No Name'
        try:
            item['Phone'] = value['_source']['primaryPhone']
        except KeyError:
            item['Phone'] = 'No Phone number'
        try:
            item["Email"] =  value['_source']['primaryEmail']
        except KeyError:
            item['Email'] = 'No Email'
        try:
            item["City"] = value['_source']['activeLocations'][0]['city']
        except KeyError:
            item['City'] = 'No City'
        try:
             item["Zip_code"] = value['_source']['activeLocations'][0]['zipcode']
        except KeyError:
            item['Zip_code'] = 'No Zip code'
                
        try:
            item["Website"] = value['AgentMarketingCenter'][0]['Website']
        except KeyError:
            item['Website'] = 'No Website'
               
        try:
            item["Facebook"] = value['_source']['AgentMarketingCenter'][0]['Facebook_URL']
        except KeyError:
            item['Facebook'] = 'No Facebook'
                
        try:
            item["Linkedin"] = value['_source']['AgentMarketingCenter'][0]['LinkedIn_URL']
        except KeyError:
            item['Linkedin'] = 'No Linkedin'    
        try:
            item["Twitter"] = value['_source']['AgentMarketingCenter'][0]['Twitter']
        except KeyError:
            item['Twitter'] = 'No Twitter'
        
        try:
             item["Bio"]: value['_source']['AgentMarketingCenter'][0]['Bio']
        except KeyError:
            item['Bio'] = 'No Bio'
               
        yield item

获取AttributeError错误'str'对象没有属性'get'

1 个答案:

代码示例

解释

提示

更新代码

items.py

蜘蛛脚本