为什么Beautiful Soup找不到我想要的html元素?

时间:2020-06-14 06:30:40

标签: python beautifulsoup

我正在尝试通过解析漂亮的汤来从币库中获得加密货币的价格变化。在coinbase网站(https://www.coinbase.com/price/ethereum)上,我可以找到价格变动的html元素。

<h4 class="TextElement__Spacer-hxkcw5-0 caIgfs Header__StyledHeader-sc-1xiyexz-0 dLILyj">+0.33%</h4>

然后在python中,我使用漂亮的汤通过h4标签来查找此元素,它会找到其他h4标签,但找不到我要查找的标签

import requests
from bs4 import BeautifulSoup 

 result = requests.get("https://www.coinbase.com/price/ethereum")
 src = result.content
 soup = BeautifulSoup(src, "html.parser")
 tags = soup.find_all("h4")
 print (tags)

1 个答案:

答案 0 :(得分:1)

数据嵌入在<script>标签内的页面中。您可以使用json模块进行解析。

例如:

import json
import requests
from bs4 import BeautifulSoup


url = 'https://www.coinbase.com/price/ethereum'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

data = json.loads(soup.select_one('script#server-app-state').contents[0])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] )

打印:

{'hour': 0.0038781959207123133, 'day': -0.0025064363163135772, 'week': -0.02360650279511788, 'month': 0.13293312491891887, 'year': -0.10963199613423964}

编辑:

data = json.loads(soup.select_one('script#server-app-state').contents[0])行将:

1。)从汤中选择元素<script id="server-app-state">...</script>

2。)此标签的内容是Json字符串,所以我用json.loads()

对其进行解码

3。)结果存储到变量data(Python字典)

print( data['initialData']['data']['prices']['prices']['latestPrice']['percentChange'] )行将仅打印该词典中的内容(您可以通过取消注释第print(json.dumps(data, indent=4))行来注释该词典的完整内容