我正在草率地编写一个程序来抓取下一页https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066,它只是抓取第一行数据,而不抓取其余数据。我认为这与我的for循环有关,但是当我将循环更改为更宽时,它会输出过多的数据,因为它会多次输出每一行数据。
def parse(self, response):
item = GameItem()
saved_name = ""
for game in response.css("div.row.mt-1.list-view"):
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
更新#1
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
yield item
答案 0 :(得分:2)
我认为您需要CSS以下的内容(以后可以将其用作处理buying-options
容器的基础):
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
Card_Name = game.css("a.card-text::text").get()
item["Card_Name"] = Card_Name.strip()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
# process buying-option
# may be you need to move GameItem() initialization inside this loop
yield item
如您所见,我在循环中移动了item = GameItem()
。这里的saved_game
也不需要。
答案 1 :(得分:0)
response.css("div.row.mt-1.list-view")
仅返回1个选择器,因此循环中的代码仅运行一次。尝试以下操作:for game in response.css(".mt-1.list-view .card-text"):
,您将获得一个循环选择器的列表。
答案 2 :(得分:0)
您是代码-它不起作用,因为您是在列表循环之外创建GameItem()。我一定错过了有关此.get()和.getall()方法的明信片。也许有人可以评论它与摘录有何不同?
您失败的代码
def parse(self, response):
item = GameItem() # this line right here only creates 1 game item per page
saved_name = ""
for game in response.css("div.row.mt-1.list-view"): # this line fails since it gets all the items on the page. This is a wrapper wrapping all the items inside of it. See below code for corrected selector.
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
解决问题的固定代码:
def parse(self, response):
for game in response.css("div.product-col"):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
if not item["Card_Name"]:
continue # this will skip to the next item if there is no card name, if there is a card name it will continue to yield the item. Another way of doing this would be to return nothing. Just "return". You only do this if you DO NOT want code after executed. If you want the code after to execute then use yeid.
yield item