Question

试图从某个网站上刮取价格，但某些价格被划掉，并显示一个新价格，因此这些价格我为零。好吧，我认为我可以建立一个if语句来获得正确的价格，这有点奏效。但是，我没有得到新的价格，而是得到了划掉的价格，因为两者的标识符相同。有想法该怎么解决这个吗？

  for game in response.css("tr[class^=deckdbbody]"):

            # Initialize saved_name to the extracted card name
            saved_name  = game.css("a.card_popup::text").extract_first() or saved_name
            # Now call item and set equal to saved_name and strip leading '\n' from output
            item["Card_Name"] = saved_name.strip()
            # Check to see if output is null, in the case that there are two different conditions for one card
            if item["Card_Name"] != None:
                # If not null than store value in saved_name
                saved_name = item["Card_Name"].strip()
            # If null then set null value to previous card name since if there is a null value you should have the same card name twice
            else:
                item["Card_Name"] = saved_name
            # Call item again in order to extract the condition, stock, and price using the corresponding html code from the website
            item["Condition"] = game.css("td[class^=deckdbbody].search_results_7 a::text").get()
            item["Stock"] = game.css("td[class^=deckdbbody].search_results_8::text").extract_first()
            item["Price"] = game.css("td[class^=deckdbbody].search_results_9::text").extract_first()
            if item["Price"] == None:
                item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span::text").get()

            # Return values
            yield item

Answer 1

您需要考虑样式标签style="text-decoration:line-through"是否适合您不想要的价格来进行刮制。

为此，您可以使用BeautifulSoup并考虑未交叉的价格没有样式标签：

from bs4 import BeautifulSoup as bs
import requests as r

response = r.get(url)
soup = bs(response.content)
decks = bs.find_all('td', {'class': 'deckdbbody', 'style': None})

现在获取每个文本中的文本内容，即价格：

prices = [d.getText().strip() for d in decks]

随着您的更新，我可以看到您会在prices列表中得到不需要的东西，因为很多td使用此类，甚至都不是价格，一种简单的解决方法是检查是否您在.getText()中有一个美元符号：

final = []
for price in prices:
    if '$' in price:
        final.append(price)

现在final仅拥有您真正想要的东西。

Answer 2

这才是最终工作

if item["Price"] == None:
    item["Price"] = game.css("td[class^=deckdbbody].search_results_9 span[style*='color:red']::text").get()

很难从网站上获取一些价格

2 个答案: