我试图抓取以下蜘蛛:
import scrapy
from tutorial.items import QuoteItem
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'FEED_URI': 's3://apkmirror/quotes.json',
'AWS_ACCESS_KEY_ID': 'foo',
'AWS_SECRET_ACCESS_KEY': 'bar',
}
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for quote in response.css('div.quote'):
item = QuoteItem()
item['text'] = quote.css('span.text::text').extract_first()
item['author'] = quote.css('small.author::text').extract_first()
item['tags'] = quote.css('div.tags a.tag::text').extract()
yield item
其中'foo'
和'bar'
分别是法兰克福的Amazon S3存储桶的AWS访问密钥ID和密钥,而items.py
只是
import scrapy
class QuoteItem(scrapy.Item):
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
但是,当我尝试scrapy crawl quotes
时,日志包含以下错误消息:
2017-05-15 18:33:56 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7fd56fd3b488>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7fd56fd38b18>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7fd56fd3ba28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7fd56fd38aa0>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7fd56fd38a28>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7fd56fd38ed8>
2017-05-15 18:33:56 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'body': <open file '<fdopen>', mode 'w+b' at 0x7fd56ef29810>, 'url': u'https://s3.amazonaws.com/apkmirror/quotes.json', 'headers': {'Content-MD5': u'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue', 'User-Agent': 'Botocore/1.4.67 Python/2.7.12 Linux/4.4.0-75-generic'}, 'context': {'client_region': u'us-east-1', 'signing': {'bucket': 'apkmirror'}, 'has_streaming_input': True, 'client_config': <botocore.config.Config object at 0x7fd56ec7b610>}, 'query_string': {}, 'url_path': u'/apkmirror/quotes.json', 'method': u'PUT'}
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fd56ec7b510>>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7fd56fe285f0>
2017-05-15 18:33:56 [botocore.utils] DEBUG: Checking for DNS compatible bucket for: https://s3.amazonaws.com/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.utils] DEBUG: URI updated to: https://apkmirror.s3.amazonaws.com/quotes.json
2017-05-15 18:33:56 [botocore.auth] DEBUG: Calculating signature using hmacv1 auth.
2017-05-15 18:33:56 [botocore.auth] DEBUG: HTTP request method: PUT
2017-05-15 18:33:56 [botocore.auth] DEBUG: StringToSign:
PUT
U+PeT0soEYWoCF4DMQXEzA==
Mon, 15 May 2017 16:33:56 GMT
/apkmirror/quotes.json
2017-05-15 18:33:56 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): apkmirror.s3.amazonaws.com
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-15 18:33:56 [botocore.awsrequest] DEBUG: Received a non 100 Continue response from the server, NOT sending request body.
2017-05-15 18:33:56 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /quotes.json HTTP/1.1" 400 None
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response headers: {'x-amz-region': 'eu-central-1', 'x-amz-id-2': 'ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'close', 'x-amz-request-id': '276FC0F60406C7C5', 'date': 'Mon, 15 May 2017 16:33:55 GMT', 'content-type': 'application/xml'}
2017-05-15 18:33:56 [botocore.parsers] DEBUG: Response body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>276FC0F60406C7C5</RequestId><HostId>ti0jteHsbwyFinnUnoVAz5xywBgGBnRnIq+HlEZyZ4YDZ83yagh8tEttuelsB+UFmA+ssOO3iFk=</HostId></Error>
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7fd56ece8290>
2017-05-15 18:33:56 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-15 18:33:56 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7fd56ec7bad0>>
2017-05-15 18:33:56 [scrapy.extensions.feedexport] ERROR: Error storing jsonlines feed (20 items) in: s3://apkmirror/quotes.json
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 250, in inContext
result = inContext.theWork()
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/extensions/feedexport.py", line 118, in _store_in_thread
Bucket=self.bucketname, Key=self.keyname, Body=file)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 251, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 537, in _make_api_call
raise ClientError(parsed_response, operation_name)
ClientError: An error occurred (InvalidRequest) when calling the PutObject operation: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.
从The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256和Using boto for AWS S3 Buckets for Signature V4来看,问题与法兰克福的S3存储桶密切相关(没有双关语)。一种解决方案涉及更改boto&#39; s host
中的connect_to_region
参数。
但是,在我的情况下,boto
的使用由scrapy
源代码处理,我宁愿不触及。我该如何解决这个问题?
答案 0 :(得分:2)
一种解决方案涉及在boto的connect_to_region中更改主机参数。
导出到S3的存储后端由scrapy.extensions.feedexport.S3FeedStorage
处理您可以继承S3FeedStorage
类,并实现自己的类,这解决了无法匹配的S3存储桶身份验证机制的问题。
您还需要添加
{
"s3": "myproject.extentions.MyS3FeedStorage",
}
进入FEED_STORAGES
设置,要求Scrapy使用它。
另请参阅document
答案 1 :(得分:1)
所以这是scrapy中的一个悬而未决的问题(here)。您可以使用aws共享配置文件将签名版本设置为s3v4来解决此问题。您可以看到所有s3配置文档here。
要设置sigv4,您可以使用以下内容创建文件~/.aws/config
:
[default]
s3 =
signature_version = s3v4
或者,如果您已经安装了aws cli,则可以运行:
aws configure set default.s3.signature_version s3v4
答案 2 :(得分:1)
为了完整起见,这是我对答案的实施。最后,我发现修改我的AWS配置(由Jordan Phillips建议)而不是子类S3FeedStorage
(根据starrify的建议)更容易。我使用以下Dockerfile
来运行刮刀:
# Adapted from trcook/docker-scrapy
FROM python:alpine
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev
RUN pip install scrapy botocore awscli
RUN aws configure set aws_access_key_id foo
RUN aws configure set aws_secret_access_key bar
RUN aws configure set default.region eu-central-1
RUN aws configure set default.s3.signature_version s3v4
COPY . /scraper
WORKDIR /scraper
CMD ["scrapy", "crawl", "quotes"]
其中foo
和bar
分别是实际的AWS访问密钥ID和AWS秘密访问密钥。如果我docker build --tag quotes .
后面跟docker run quotes
,则刮刀运行时没有错误:
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: tutorial)
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-05-16 13:03:37 [scrapy.core.engine] INFO: Spider opened
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-16 13:03:37 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
'cannot be changed without changing our thinking.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
'than our abilities.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
'is a miracle. The other is as though everything is a miracle.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
'novel, must be intolerably stupid.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
'absolutely ridiculous than absolutely boring.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
'you are not.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
"it's in hot water.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Marilyn Monroe',
'tags': ['friends', 'heartbreak', 'inspirational', 'life', 'love', 'sisters'],
'text': "“This life is what you make it. No matter what, you're going to mess "
"up sometimes, it's a universal truth. But the good part is you get "
"to decide how you're going to mess it up. Girls will be your friends "
"- they'll act like it anyway. But just remember, some come, some go. "
"The ones that stay with you through everything - they're your true "
"best friends. Don't let go of them. Also remember, sisters make the "
"best friends in the world. As for lovers, well, they'll come and go "
'too. And baby, I hate to say it, most of them - actually pretty much '
"all of them are going to break your heart, but you can't give up "
"because if you give up, you'll never find your soulmate. You'll "
'never find that half who makes you whole and that goes for '
"everything. Just because you fail once, doesn't mean you're gonna "
'fail at everything. Keep trying, hold on, and always, always, always '
"believe in yourself, because if you don't, then who will, sweetie? "
'So keep your head high, keep your chin up, and most importantly, '
"keep smiling, because life's a beautiful thing and there's so much "
'to smile about.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'J.K. Rowling',
'tags': ['courage', 'friends'],
'text': '“It takes a great deal of bravery to stand up to our enemies, but '
'just as much to stand up to our friends.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Albert Einstein',
'tags': ['simplicity', 'understand'],
'text': "“If you can't explain it to a six year old, you don't understand it "
'yourself.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Bob Marley',
'tags': ['love'],
'text': '“You may not be her first, her last, or her only. She loved before '
'she may love again. But if she loves you now, what else matters? '
"She's not perfect—you aren't either, and the two of you may never be "
'perfect together but if she can make you laugh, cause you to think '
'twice, and admit to being human and making mistakes, hold onto her '
'and give her the most you can. She may not be thinking about you '
'every second of the day, but she will give you a part of her that '
"she knows you can break—her heart. So don't hurt her, don't change "
"her, don't analyze and don't expect more than she can give. Smile "
'when she makes you happy, let her know when she makes you mad, and '
"miss her when she's not there.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Dr. Seuss',
'tags': ['fantasy'],
'text': '“I like nonsense, it wakes up the brain cells. Fantasy is a '
'necessary ingredient in living.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Douglas Adams',
'tags': ['life', 'navigation'],
'text': '“I may not have gone where I intended to go, but I think I have '
'ended up where I needed to be.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Elie Wiesel',
'tags': ['activism',
'apathy',
'hate',
'indifference',
'inspirational',
'love',
'opposite',
'philosophy'],
'text': "“The opposite of love is not hate, it's indifference. The opposite "
"of art is not ugliness, it's indifference. The opposite of faith is "
"not heresy, it's indifference. And the opposite of life is not "
"death, it's indifference.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Friedrich Nietzsche',
'tags': ['friendship',
'lack-of-friendship',
'lack-of-love',
'love',
'marriage',
'unhappy-marriage'],
'text': '“It is not a lack of love, but a lack of friendship that makes '
'unhappy marriages.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Mark Twain',
'tags': ['books', 'contentment', 'friends', 'friendship', 'life'],
'text': '“Good friends, good books, and a sleepy conscience: this is the '
'ideal life.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Allen Saunders',
'tags': ['fate', 'life', 'misattributed-john-lennon', 'planning', 'plans'],
'text': '“Life is what happens to us while we are making other plans.”'}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7f8c2f2b0ae8>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7f8c2f2acea0>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7f8c2f2b1268>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7f8c2f2ace18>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7f8c2f2aca60>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7f8c2f2acd90>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7f8c2f2b0378>
2017-05-16 13:03:38 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'url_path': '/apkmirror/quotes3.json', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.5.49 Python/3.6.1 Linux/4.4.0-75-generic', 'Content-MD5': 'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7f8c2f22e2b0>, 'url': 'https://s3.eu-central-1.amazonaws.com/apkmirror/quotes3.json', 'context': {'client_region': 'eu-central-1', 'client_config': <botocore.config.Config object at 0x7f8c2e7a7438>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'apkmirror'}}}
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f8c2e7a73c8>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7f8c2f2ac950>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7f8c2f42dd08>
2017-05-16 13:03:38 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2017-05-16 13:03:38 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/apkmirror/quotes3.json
content-md5:U+PeT0soEYWoCF4DMQXEzA==
host:s3.eu-central-1.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20170516T130338Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2017-05-16 13:03:38 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20170516T130338Z
20170516/eu-central-1/s3/aws4_request
929e3a39776d42c15c4c7c197c718f67b6105341ed4a269365c6e6ed88378a69
2017-05-16 13:03:38 [botocore.auth] DEBUG: Signature:
81a1c8014fa22d52d371a8aea10d47e0f32e8913dcc18b2f1210c7ce458311e4
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): s3.eu-central-1.amazonaws.com
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /apkmirror/quotes3.json HTTP/1.1" 200 0
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'WB/HgvEGKd7ysqcRa1vodr2znuevKA+fTTX/2elIAcID05t7Ex2G7UTM+rl/AhvIPeB+0gL4YaY=', 'x-amz-request-id': '9C449953B48DA63F', 'date': 'Tue, 16 May 2017 13:03:39 GMT', 'etag': '"53e3de4f4b281185a8085e033105c4cc"', 'content-length': '0', 'server': 'AmazonS3'}
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response body:
b''
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f8c2e774e10>
2017-05-16 13:03:38 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (20 items) in: s3://apkmirror/quotes3.json
2017-05-16 13:03:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 675,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 5976,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 5, 16, 13, 3, 38, 317079),
'item_scraped_count': 20,
'log_count/DEBUG': 75,
'log_count/INFO': 11,
'response_received_count': 3,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2017, 5, 16, 13, 3, 37, 897491)}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Spider closed (finished)
此外,在我的蜘蛛中,我不再需要实施AWS_ACCESS_KEY_ID
和AWS_SECRET_ACCESS_KEY
设置,因为这些设置是从配置文件中“拾取”的。