我正在努力让我的代码从当前请求调用的每个json对象中提取数据,然后一旦它通过每个json对象,继续下一批请求下一批json对象。看来我的脚本只是一遍又一遍地刮擦第一个请求调用。有人可以帮我解决我在for循环和/或循环中丢失的问题吗?在此先感谢!!
import scrapy
import json
import requests
import re
from time import sleep
import sys
class LetgoSpider(scrapy.Spider):
name = 'letgo'
allowed_domains = ['letgo.com/en', 'search-products-pwa.letgo.com']
start_urls = ['https://search-products-pwa.letgo.com/api/products?country_code=US&offset=0&quadkey=0320030123201&num_results=50&distance_radius=50&distance_type=mi']
offset = 0
def parse(self, response):
data = json.loads(response.text)
if len(data) == 0:
sys.exit()
else:
for used_item in data:
try:
if used_item['name'] == None:
title = used_item['image_information']
else:
title = used_item['name']
id_number = used_item['id']
price = used_item['price']
description = used_item['description']
date = used_item['updated_at']
images = [img['url'] for img in used_item['images']]
latitude = used_item['geo']['lat']
longitude = used_item['geo']['lng']
link = 'https://us.letgo.com/en/i/' + re.sub(r'\W+', '-', title) + '_' + id_number
location = used_item['geo']['city']
except:
pass
yield {'Title': title,
'Url': link,
'Price': price,
'Description': description,
'Date': date,
'Images': images,
'Latitude': latitude,
'Longitude': longitude,
'Location': location,
}
self.offset += 50
new_request = 'https://search-products-pwa.letgo.com/api/products?country_code=US&offset=' + str(self.offset) + \
'&quadkey=0320030123201&num_results=50&distance_radius=50&distance_type=mi'
print('new request is: ' + new_request)
sleep(1)
yield scrapy.Request(new_request, callback=self.parse)
答案 0 :(得分:1)
尝试运行此代码。我只清理了一下。
import json
import re
import scrapy
class LetgoSpider(scrapy.Spider):
name = 'letgo'
allowed_domains = ['letgo.com/en', 'search-products-pwa.letgo.com']
search_url = 'https://search-products-pwa.letgo.com/api/products' \
'?country_code=US' \
'&offset={offset}' \
'&quadkey=0320030123201' \
'&num_results={num_results}' \
'&distance_radius=50' \
'&distance_type=mi'
offset = 0
num_results = 5
max_pages = 3
start_urls = [
search_url.format(offset=offset, num_results=num_results)
]
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1',
'LOG_LEVEL': 'INFO',
}
def parse(self, response):
data = json.loads(response.text)
for used_item in data:
try:
title = used_item['name'] or used_item['image_information']
id_number = used_item['id']
price = used_item['price']
description = used_item['description']
date = used_item['updated_at']
images = [img['url'] for img in used_item['images']]
latitude = used_item['geo']['lat']
longitude = used_item['geo']['lng']
link = 'https://us.letgo.com/en/i/' + re.sub(r'\W+', '-', title) + '_' + id_number
location = used_item['geo']['city']
except KeyError:
pass
else:
item = {
'Title': title,
'Url': link,
'Price': price,
'Description': description,
'Date': date,
'Images': images,
'Latitude': latitude,
'Longitude': longitude,
'Location': location,
}
print(item)
yield item
self.offset += self.num_results
if self.offset > self.num_results * self.max_pages:
return
next_page_url = self.search_url.format(offset=self.offset, num_results=self.num_results)
yield scrapy.Request(url=next_page_url, callback=self.parse)
以下是运行时的日志
/Volumes/Dev/miniconda3/envs/scm/bin/python -m scrapy runspider sc.py
2018-02-22 00:46:23 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-02-22 00:46:23 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.9.0, Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59) - [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g 2 Nov 2017), cryptography 2.1.4, Platform Darwin-17.4.0-x86_64-i386-64bit
2018-02-22 00:46:23 [scrapy.crawler] INFO: Overridden settings: {'LOG_LEVEL': 'INFO', 'SPIDER_LOADER_WARN_ONLY': True, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'}
2018-02-22 00:46:23 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2018-02-22 00:46:23 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-02-22 00:46:23 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-02-22 00:46:23 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-02-22 00:46:23 [scrapy.core.engine] INFO: Spider opened
2018-02-22 00:46:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
{'Title': '54 Inch Light Bar', 'Url': 'https://us.letgo.com/en/i/54-Inch-Light-Bar_fbe7f2b2-29b4-4a39-a1c6-77e8fde56ab5', 'Price': 80, 'Description': '54 Inch Light Bar...New never been installed...Call or Text [TL_HIDDEN] ', 'Date': '2018-02-21T23:38:46+00:00', 'Images': ['https://img.letgo.com/images/72/94/6c/90/72946c90a739a4710ca709af1e87ffca.jpeg'], 'Latitude': 35.5362217, 'Longitude': -82.8092321, 'Location': 'Canton'}
{'Title': 'Jr Tour Golf Clubs', 'Url': 'https://us.letgo.com/en/i/Jr-Tour-Golf-Clubs_40324f63-3b18-401a-bdad-900d58fa9be1', 'Price': 40, 'Description': 'Right handed golf clubs ', 'Date': '2018-02-21T23:38:20+00:00', 'Images': ['https://img.letgo.com/images/33/8a/cf/6f/338acf6fc7959626683fbe857480e9a9.jpeg', 'https://img.letgo.com/images/60/7d/37/b1/607d37b1281fce2b48a045398d49ff4c.jpeg', 'https://img.letgo.com/images/ae/de/60/b1/aede60b1260124bfdbacbc7a9aaf25c8.jpeg', 'https://img.letgo.com/images/f0/3e/2c/03/f03e2c031e1976986e25f9f12b1ddd20.jpeg'], 'Latitude': 35.657392629984, 'Longitude': -82.705151547089, 'Location': 'Leicester'}
{'Title': 'Glass vase', 'Url': 'https://us.letgo.com/en/i/Glass-vase_ebaad5f6-afc0-42cb-99b2-aae9ce0cec31', 'Price': 80, 'Description': '', 'Date': '2018-02-21T23:37:20+00:00', 'Images': ['https://img.letgo.com/images/97/fa/68/82/97fa6882b38be80a6084ffa605a94fae.jpeg', 'https://img.letgo.com/images/68/35/a5/d6/6835a5d65f8443abe12e1afa69eb75cd.jpeg'], 'Latitude': 35.580766432121, 'Longitude': -82.622580964386, 'Location': 'Asheville'}
{'Title': "women's pink and black polka-dot long-sleeved top", 'Url': 'https://us.letgo.com/en/i/women-s-pink-and-black-polka-dot-long-sleeved-top_d33d05a3-a362-487d-af3c-10f70c1edc54', 'Price': 2, 'Description': '18 months ', 'Date': '2018-02-21T23:37:01+00:00', 'Images': ['https://img.letgo.com/images/87/e4/44/21/87e44421d0bae79bce09424b39ad9bd8.jpeg'], 'Latitude': 35.5135800231, 'Longitude': -82.68708409485, 'Location': 'Candler'}
{'Title': 'yellow and black DeWalt power tool kit set', 'Url': 'https://us.letgo.com/en/i/yellow-and-black-DeWalt-power-tool-kit-set_45a070fc-8d45-479d-8453-0d52e899423a', 'Price': 115, 'Description': '110-115. I have a bag to fit it all for a I total of 130', 'Date': '2018-02-21T23:36:12+00:00', 'Images': ['https://img.letgo.com/images/bc/2f/69/71/bc2f6971e2891e9bb80205ba03d6c209.jpeg', 'https://img.letgo.com/images/0d/4c/0c/f2/0d4c0cf2536c29320fdd7fffa05cb242.jpeg', 'https://img.letgo.com/images/53/0e/97/78/530e9778c5e5266eaad92afa6ccb0405.jpeg', 'https://img.letgo.com/images/58/93/62/05/58936205711631e148bd5a17cf5d8d14.jpeg'], 'Latitude': 35.580774319984, 'Longitude': -82.62263189396, 'Location': 'Asheville'}
{'Title': "girl's gray and white Calvin Klein sweater", 'Url': 'https://us.letgo.com/en/i/girl-s-gray-and-white-Calvin-Klein-sweater_2ee6a5dd-bec7-4a0b-a575-38ceacebc193', 'Price': 3, 'Description': '12 months ', 'Date': '2018-02-21T23:36:11+00:00', 'Images': ['https://img.letgo.com/images/19/a4/83/0d/19a4830dc0fcc598218ba2ad49566dcf.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': "toddler's blue, pink, and white floral embellished denim bib overalls", 'Url': 'https://us.letgo.com/en/i/toddler-s-blue-pink-and-white-floral-embellished-denim-bib-overalls_6551c032-0de2-4b25-b4d6-29e39860d0cc', 'Price': 5, 'Description': '18 months ', 'Date': '2018-02-21T23:35:38+00:00', 'Images': ['https://img.letgo.com/images/2d/d3/84/3a/2dd3843a82031d3c88f96822d5dbff3c.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': 'red and black dog print pajama set', 'Url': 'https://us.letgo.com/en/i/red-and-black-dog-print-pajama-set_8020d458-b135-4d3e-a057-bb559a85156a', 'Price': 5, 'Description': '18 months ', 'Date': '2018-02-21T23:35:10+00:00', 'Images': ['https://img.letgo.com/images/14/ee/c5/c3/14eec5c3b94337050766c5dd4932b2cb.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': 'black, pink, and green floral dress', 'Url': 'https://us.letgo.com/en/i/black-pink-and-green-floral-dress_ea495806-20ff-4ee8-accb-d29e437f93af', 'Price': 3, 'Description': '12-18 months ', 'Date': '2018-02-21T23:34:45+00:00', 'Images': ['https://img.letgo.com/images/22/6f/7b/28/226f7b28e93213c9de571da0d58c1483.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': "girl's black and white Minnie Mouse polka-dot crew-neck dress", 'Url': 'https://us.letgo.com/en/i/girl-s-black-and-white-Minnie-Mouse-polka-dot-crew-neck-dress_c3affc21-ab01-434c-9252-327c77b0f014', 'Price': 4, 'Description': '12 months ', 'Date': '2018-02-21T23:34:10+00:00', 'Images': ['https://img.letgo.com/images/d8/56/92/51/d85692518e3d3e7b7dcb9200688c9ba4.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': "girl's purple and pink floral spaghetti strap dress", 'Url': 'https://us.letgo.com/en/i/girl-s-purple-and-pink-floral-spaghetti-strap-dress_cada630f-b600-4e6a-be38-9d4f2c9d9407', 'Price': 4, 'Description': '6-12 months ', 'Date': '2018-02-21T23:33:41+00:00', 'Images': ['https://img.letgo.com/images/a9/b2/3c/c1/a9b23cc1dc6de8c5443a163da54b5424.jpeg'], 'Latitude': 35.513783889312, 'Longitude': -82.686794813796, 'Location': 'Candler'}
{'Title': 'copper coil pendant necklace', 'Url': 'https://us.letgo.com/en/i/copper-coil-pendant-necklace_6e56e1f9-986c-4da6-ada0-71bf3a4ea077', 'Price': 65, 'Description': None, 'Date': '2018-02-21T23:33:21+00:00', 'Images': ['https://img.letgo.com/images/56/a5/c6/d0/56a5c6d063879645bdefa40c45a85e4a.jpeg'], 'Latitude': 35.569333, 'Longitude': -82.580862, 'Location': 'Asheville'}
{'Title': 'black and green corded hammer drill', 'Url': 'https://us.letgo.com/en/i/black-and-green-corded-hammer-drill_d6dccdce-99d1-4cbc-be01-31761ecae0e7', 'Price': 499.95, 'Description': None, 'Date': '2018-02-21T23:32:46+00:00', 'Images': ['https://img.letgo.com/images/69/df/c8/9f/69dfc89f00f514ab630646678c5f02fc.jpeg'], 'Latitude': 35.5861382, 'Longitude': -82.5974746, 'Location': 'Asheville'}
{'Title': 'Ihip Bluetooth headphones', 'Url': 'https://us.letgo.com/en/i/Ihip-Bluetooth-headphones_77493587-2400-425b-ab8d-802dec641abf', 'Price': 25, 'Description': 'Their brand new and work great none of that having to plug them into your phone they see completely wireless hust turn on your Bluetooth and listen to music or talk on the phone with the built in speaker and volume control!!\nMeet at Marshall ingles... \nFor more great stuff visit... \n', 'Date': '2018-02-21T23:30:55+00:00', 'Images': ['https://img.letgo.com/images/3d/c1/a8/93/3dc1a8936b2fded2017ef8c93ba31c9a.jpeg'], 'Latitude': 35.820196, 'Longitude': -82.629765, 'Location': 'Marshall'}
{'Title': 'Lot of 2 Pampers size 6', 'Url': 'https://us.letgo.com/en/i/Lot-of-2-Pampers-size-6_a29dcee0-ec88-4a56-8832-b14a2c300ddf', 'Price': 40, 'Description': None, 'Date': '2018-02-21T23:31:32+00:00', 'Images': ['https://img.letgo.com/images/37/31/39/02/37313902874a116c6acdcb1b1ff3a710.jpeg'], 'Latitude': 35.597118, 'Longitude': -82.516648, 'Location': 'Asheville'}
{'Title': 'Vintage candy dish', 'Url': 'https://us.letgo.com/en/i/Vintage-candy-dish_1321bf48-500b-4fcd-9704-e1466e04a51b', 'Price': 20, 'Description': 'Amber tiara pedestal candy dish. Perfect condition.', 'Date': '2018-02-21T23:29:46+00:00', 'Images': ['https://img.letgo.com/images/1c/00/13/03/1c00130383113f1e20cc1d0306b0e452.jpeg'], 'Latitude': 35.4645648, 'Longitude': -83.0014414, 'Location': 'Waynesville'}
{'Title': 'Blue and White Suzuki 400, yr 2005', 'Url': 'https://us.letgo.com/en/i/Blue-and-White-Suzuki-400-yr-2005_62dadb29-ec18-4a5d-baa7-378ce7796822', 'Price': 3700, 'Description': None, 'Date': '2018-02-21T23:29:12+00:00', 'Images': ['https://img.letgo.com/images/aa/71/34/27/aa713427b1e8af67f276febb5f1ae17a.jpeg'], 'Latitude': 35.4671172, 'Longitude': -83.0026703, 'Location': 'Waynesville'}
{'Title': 'Handmade Hemp Bracelets & Key chains', 'Url': 'https://us.letgo.com/en/i/Handmade-Hemp-Bracelets-Key-chains_d374e086-729c-4240-8e99-2699c3275ec3', 'Price': 6, 'Description': None, 'Date': '2018-02-21T23:27:42+00:00', 'Images': ['https://img.letgo.com/images/0d/32/ea/27/0d32ea2715095357e9cda3cda6598415.jpeg'], 'Latitude': 35.4833764, 'Longitude': -82.4578764, 'Location': 'Fletcher'}
{'Title': 'Handmade Hemp Necklaces', 'Url': 'https://us.letgo.com/en/i/Handmade-Hemp-Necklaces_d3c22d76-4d4d-43f7-a613-ef4d5a4e53bd', 'Price': 8, 'Description': None, 'Date': '2018-02-21T23:25:58+00:00', 'Images': ['https://img.letgo.com/images/b6/e0/8d/0a/b6e08d0a79f57215f5fc5417451fbd04.jpeg'], 'Latitude': 35.4833764, 'Longitude': -82.4578764, 'Location': 'Fletcher'}
{'Title': 'Luvs and Huggies disposable diaper packs', 'Url': 'https://us.letgo.com/en/i/Luvs-and-Huggies-disposable-diaper-packs_75204ed1-ed11-484e-81e6-cc923b923292', 'Price': 13, 'Description': None, 'Date': '2018-02-21T23:23:55+00:00', 'Images': ['https://img.letgo.com/images/3a/ce/16/a1/3ace16a18b398de6e0c8d4b56d1fa8c9.jpeg'], 'Latitude': 35.597118, 'Longitude': -82.516648, 'Location': 'Asheville'}
2018-02-22 00:46:24 [scrapy.core.engine] INFO: Closing spider (finished)
2018-02-22 00:46:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1977,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 4,
'downloader/response_bytes': 7625,
'downloader/response_count': 4,
'downloader/response_status_count/200': 4,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 2, 21, 23, 46, 24, 468717),
'item_scraped_count': 20,
'log_count/INFO': 7,
'memusage/max': 50208768,
'memusage/startup': 50208768,
'request_depth_max': 3,
'response_received_count': 4,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'start_time': datetime.datetime(2018, 2, 21, 23, 46, 23, 770175)}
2018-02-22 00:46:24 [scrapy.core.engine] INFO: Spider closed (finished)
Process finished with exit code 0