我正在使用scrapy从amazon.com抓取手机名称,价格和评级,但只能在名称和价格前获取评级和空白列表。可能是什么错误?
以下是代码:
import scrapy
class AmazonItem(scrapy.Item):
name=scrapy.Field()
price=scrapy.Field()
rating=scrapy.Field()
pass
class myspider(scrapy.Spider):
name="amazon_spider"
def start_requests(self):
urls=[
"https://www.amazon.in/s?k=samsung"
]
for url in urls:
yield scrapy.Request(url=url,callback=self.parse)
def parse(self, response):
items=AmazonItem()
name = response.css('span.a-size-medium a-color-base a-text-
normal::text').extract()
price = response.css('span.a-price-whole::text').extract()
rating = response.css('span.a-icon-alt::text').extract()
items['name']=name
items['price']=price
items['rating']=rating
yield items
这就是我得到的结果:
2019-05-28 14:50:32 [scrapy.utils.log] INFO: Scrapy 1.5.2 started
(bot: amazon) 2019-05-28 14:50:33 [scrapy.utils.log] INFO: Versions:
lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib
1.20.0, Twisted 18.7.0, Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL
1.1.1b 26 Feb
2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
2019-05-28 14:50:33 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'amazon', 'FEED_FORMAT': 'json', 'FEED_URI':
'amazon.json', 'NEWSPIDER_MODULE': 'amazon.spiders', 'ROBOTSTXT_OBEY':
True, 'SPIDER_MODULES': ['amazon.spiders'], 'USER_AGENT': 'Mozilla/5.0
AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1;
+http://www.google.com/bot.html) Safari/537.36'} 2019-05-28 14:50:33 [scrapy.extensions.telnet] INFO: Telnet Password: 2423d32d709a9f10
2019-05-28 14:50:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats'] 2019-05-28 14:50:33
[scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-05-28
14:50:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-05-28 14:50:33
[scrapy.middleware] INFO: Enabled item pipelines: [] 2019-05-28
14:50:33 [scrapy.core.engine] INFO: Spider opened 2019-05-28 14:50:33
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min),
scraped 0 items (at 0 items/min) 2019-05-28 14:50:33
[scrapy.extensions.telnet] DEBUG: Telnet console listening on
127.0.0.1:6023 2019-05-28 14:50:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.in/robots.txt> (referer: None)
2019-05-28 14:50:34 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (301) to <GET
https://www.amazon.in/samsung/s?ie=UTF8&page=1&rh=i%3Aaps%2Ck%3Asamsung>
from <GET https://www.amazon.in/s?k=samsung> 2019-05-28 14:50:35
[scrapy.core.engine] DEBUG: Crawled (200) <GET
https://www.amazon.in/samsung/s?ie=UTF8&page=1&rh=i%3Aaps%2Ck%3Asamsung>
(referer: None) 2019-05-28 14:50:35 [scrapy.core.scraper] DEBUG:
Scraped from <200
https://www.amazon.in/samsung/s?ie=UTF8&page=1&rh=i%3Aaps%2Ck%3Asamsung>
{'name': [], 'price': [], 'rating': ['3.1 out of 5 stars',
'3.1 out of 5 stars',
'3.1 out of 5 stars',
'3.9 out of 5 stars',
'4 out of 5 stars',
'3.9 out of 5 stars',
'3.6 out of 5 stars',
'4.1 out of 5 stars',
'4 out of 5 stars',
'3.8 out of 5 stars',
'4 out of 5 stars',
'4 out of 5 stars',
'3.9 out of 5 stars',
'3.5 out of 5 stars',
'3.6 out of 5 stars',
'3.9 out of 5 stars',
'4 Stars & Up',
'3 Stars & Up',
'2 Stars & Up',
'1 Star & Up']} 2019-05-28 14:50:35 [scrapy.core.engine] INFO: Closing spider (finished) 2019-05-28 14:50:35
[scrapy.extensions.feedexport] INFO: Stored json feed (1 items) in:
amazon.json 2019-05-28 14:50:35 [scrapy.statscollectors] INFO: Dumping
Scrapy stats: {'downloader/request_bytes': 976,
'downloader/request_count': 3, 'downloader/request_method_count/GET':
3, 'downloader/response_bytes': 72313, 'downloader/response_count':
3, 'downloader/response_status_count/200': 2,
'downloader/response_status_count/301': 1, 'finish_reason':
'finished', 'finish_time': datetime.datetime(2019, 5, 28, 9, 20, 35,
442431), 'item_scraped_count': 1, 'log_count/DEBUG': 5,
'log_count/INFO': 9, 'response_received_count': 2,
'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2019, 5, 28, 9, 20, 33, 476571)}
2019-05-28 14:50:35 [scrapy.core.engine] INFO: Spider closed
(finished)