Question

我正在为一个学校项目工作，我显示比特币，eth的当前价格，也许是另一个和我的网页抓取https://cryptowat.ch/，但我找不到用于存储实时价格的标签。当我解析div标签时，它返回价格，但我无法隔离它，所以我可以在python中显示它

<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>

Answer 1

根据我的理解 - 您知道BTC字符串，可以使用它来定位您的定位器。

因此，如果它是XPath，您可以使用它和following-sibling::text()：

//h2[. = 'BTC']/following-sibling::text()

使用lxml.html的示例：

from lxml.html import fromstring

data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

root = fromstring(data)
print(root.xpath("//h2[. = 'BTC']/following-sibling::text()"))

打印['10857.00']。

如果您有任何机会使用BeautifulSoup，那将是：

from bs4 import BeautifulSoup


data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find("h2", string="BTC").find_next_sibling(text=True))

在python中进行webscraping时，在HTML中找到正确的标记

1 个答案: