试图从某个网站上刮取价格,但某些价格被划掉,并显示一个新价格,因此这些价格我为零。好吧,我认为我可以建立一个if语句来获得正确的价格,这有点奏效。但是,我没有得到新的价格,而是得到了划掉的价格,因为两者的标识符相同。有想法该怎么解决这个吗?
for game in response.css("tr[class^=deckdbbody]"):
# Initialize saved_name to the extracted card name
saved_name = game.css("a.card_popup::text").extract_first() or saved_name
# Now call item and set equal to saved_name and strip leading '\n' from output
item["Card_Name"] = saved_name.strip()
# Check to see if output is null, in the case that there are two different conditions for one card
if item["Card_Name"] != None:
# If not null than store value in saved_name
saved_name = item["Card_Name"].strip()
# If null then set null value to previous card name since if there is a null value you should have the same card name twice
else:
item["Card_Name"] = saved_name
# Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
item["Stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()
if item["Price"] == None:
item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span::text").get()
# Return values
yield item
答案 0 :(得分:1)
您需要考虑样式标签style="text-decoration:line-through"
是否适合您不想要的价格来进行刮制。
为此,您可以使用BeautifulSoup并考虑未交叉的价格没有样式标签:
from bs4 import BeautifulSoup as bs
import requests as r
response = r.get(url)
soup = bs(response.content)
decks = bs.find_all('td', {'class': 'deckdbbody', 'style': None})
现在获取每个文本中的文本内容,即价格:
prices = [d.getText().strip() for d in decks]
随着您的更新,我可以看到您会在prices
列表中得到不需要的东西,因为很多td
使用此类,甚至都不是价格,一种简单的解决方法是检查是否您在.getText()
中有一个美元符号:
final = []
for price in prices:
if '$' in price:
final.append(price)
现在final
仅拥有您真正想要的东西。
答案 1 :(得分:0)
这才是最终工作
if item["Price"] == None:
item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span[style*='color:red']::text").get()