以下是执行scrapy的过程,以使用结果将url填充到dynamodb中。我收到错误消息:
AttributeError:“ dict”对象没有属性“ urljoin”
但是,还不清楚原因。
##############################################
# Script: Prep storage for chemtrail #
# Author: James #
# Purpose: #
# Version: #
# #
##############################################
import boto3
import json
import scrapy
class ChemPrepSpider(scrapy.Spider):
name = "xxxxxx"
def start_requests(self):
urls = [
'https://www.xxxxxxx.com.au'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self,response):
dynamodb = boto3.resource('dynamodb', region_name='ap-southeast-2')
table = dynamodb.Table('chemTrailStorage')
category_links = response.css('li').xpath('a/@href').getall()
category_links_filtered = [x for x in category_links if 'shop-online' in x] # remove non category links
category_links_filtered = list(dict.fromkeys(category_links_filtered)) # remove duplicates
for category_link in category_links_filtered:
print('raw category -> ' + category_link)
next_category = response.urljoin(category_link) + '?size=99999'
print('DynamoDb insert for category: ' + next_category)
response = table.put_item(
Item={
'CategoryPath': next_category,
'ItemCount':"99999",
'JobStat':"NOT_STARTED",
'PickupDateTime':"NA",
'CompletionDateTime':"NA"
}
)
print('Response from put....')
print(response)
答案 0 :(得分:2)
似乎boto3从table.put_item命令返回一个“ Dict”-请参见AWS boto3 documentation中的此处。
这意味着您要用“ Dict”(没有urljoin属性)覆盖Scrapy的“ response”对象。
您应将dynamo_response = table.put_item替换为“ response = table.put_item”
或您选择的其他名称。