Python scrapy脚本-AttributeError:“ dict”对象没有属性“ urljoin”

时间:2019-06-24 03:32:42

标签: python dictionary scrapy

以下是执行scrapy的过程,以使用结果将url填充到dynamodb中。我收到错误消息:

  

AttributeError:“ dict”对象没有属性“ urljoin”

但是,还不清楚原因。

##############################################
#  Script:  Prep storage for chemtrail       #
#  Author: James                             #
#  Purpose:                                  #
#  Version:                                  #
#                                            #
##############################################
import boto3
import json
import scrapy

class ChemPrepSpider(scrapy.Spider):
    name = "xxxxxx"

    def start_requests(self):
        urls = [
            'https://www.xxxxxxx.com.au'
        ]

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self,response):
        dynamodb = boto3.resource('dynamodb', region_name='ap-southeast-2')
        table = dynamodb.Table('chemTrailStorage')
        category_links = response.css('li').xpath('a/@href').getall()
        category_links_filtered = [x for x in category_links if 'shop-online' in x] # remove non category links
        category_links_filtered = list(dict.fromkeys(category_links_filtered)) # remove duplicates 

        for category_link in category_links_filtered:
            print('raw category -> ' + category_link)
            next_category = response.urljoin(category_link) + '?size=99999'
            print('DynamoDb insert for category: ' + next_category)
            response = table.put_item(
                Item={
                    'CategoryPath': next_category,
                    'ItemCount':"99999",
                    'JobStat':"NOT_STARTED",
                    'PickupDateTime':"NA",
                    'CompletionDateTime':"NA"
                }
            )
            print('Response from put....')
            print(response)

1 个答案:

答案 0 :(得分:2)

似乎boto3从table.put_item命令返回一个“ Dict”-请参见AWS boto3 documentation中的此处。

这意味着您要用“ Dict”(没有urljoin属性)覆盖Scrapy的“ response”对象。

您应将dynamo_response = table.put_item替换为“ response = table.put_item”

或您选择的其他名称。