Question

我正在运行一个将数据保存到DynamoDB的蜘蛛。我一直在寻找StackOverflow的答案，但找不到。它会将stamp和title保存到DynamoDB，并使用/ u和括号等所有不同的字符。 url已正确保存，无需额外字符。如何在没有它们的情况下保存？

我的蜘蛛：

def parse(self, response):

    for item in response.xpath("//li[contains(@class, 'river-block')]"):
        url = item.xpath(".//h2[@class='post-title']/a/@href").extract()[0]
        stamp = item.xpath(".//time/@datetime").extract()
        yield scrapy.Request(url, callback=self.get_details, meta={'stamp': stamp})

def get_details(self, response):
        article = ArticleItem()
        article['title'] = response.xpath("//h1/text()").extract()
        article['url'] = format(shortener.short(response.url))
        article['stamp'] = response.meta['stamp']
        yield article

我的管道文件：

class DynamoDBStorePipeline(object):

def process_item(self, item, spider):
    dynamodb = boto3.resource('dynamodb',region_name="us-west-2")

    table = dynamodb.Table('TechCrunch')

    table.put_item(
    Item={
    'url': str(item['url']),
    'title': str(item['title']),
    'stamp': str(item['stamp']),
    }
    )
    return item

样品输出：
url：一个链接（很好）
邮票：[u'2017-05-17 08:06:47']
标题：[u'title']

Answer 1

在Scrapy中，可以使用extract获取textual data，但如果您只想extract first matched element，则可以调用选择器{{1 }}

在您的情况下，更新extract_first()和stamp选择器必须为title，如下所示：

extract_first()

Scrapy使用/ u向DB生成项目

1 个答案: