Question

我正在尝试通过解析漂亮的汤来从币库中获得加密货币的价格变化。在coinbase网站（https://www.coinbase.com/price/ethereum）上，我可以找到价格变动的html元素。

<h4 class="TextElement__Spacer-hxkcw5-0 caIgfs Header__StyledHeader-sc-1xiyexz-0 dLILyj">+0.33%</h4>

然后在python中，我使用漂亮的汤通过h4标签来查找此元素，它会找到其他h4标签，但找不到我要查找的标签

import requests
from bs4 import BeautifulSoup 

 result = requests.get("https://www.coinbase.com/price/ethereum")
 src = result.content
 soup = BeautifulSoup(src, "html.parser")
 tags = soup.find_all("h4")
 print (tags)

Answer 1

数据嵌入在<script>标签内的页面中。您可以使用json模块进行解析。

例如：

import json
import requests
from bs4 import BeautifulSoup


url = 'https://www.coinbase.com/price/ethereum'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

data = json.loads(soup.select_one('script#server-app-state').contents[0])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] )

打印：

{'hour': 0.0038781959207123133, 'day': -0.0025064363163135772, 'week': -0.02360650279511788, 'month': 0.13293312491891887, 'year': -0.10963199613423964}

编辑：

第data = json.loads(soup.select_one('script#server-app-state').contents[0])行将：

1。）从汤中选择元素<script id="server-app-state">...</script>

2。）此标签的内容是Json字符串，所以我用json.loads()

对其进行解码

3。）结果存储到变量data（Python字典）

第print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] )行将仅打印该词典中的内容（您可以通过取消注释第print(json.dumps(data, indent=4))行来注释该词典的完整内容

为什么Beautiful Soup找不到我想要的html元素？

1 个答案: