Question

我用的是python 2.7 + BeautifulSoup 4.4.1

e = BeautifulSoup(data)
s1 = e.find("div", class_="one").get_text() # Successful
s2 = e.find("div", class_="two-three").get_text() # ERROR

Answer 1

之后，在评论中查看截图：

首先，你需要阅读回复，你不能直接转换你得到的str：

e = e.read()

其次，似乎有些内容是使用javascript填充的，因此你的html不包含这些标记。

即。类rating-count没有元素：

>>> s.find('span', class_='rating-count')
[]

这并不意味着对类名的连字符搜索无效，因为如果您尝试display-price它会起作用：

>>> s.find('span', class_='display-price')
 <span class="display-price">Free</span>

这意味着你想要获得的那些在HTML中不可用，就像我之前在评论中所说的那样。

Answer 2

问题不在于bs4或连字符，问题是没有用户代理就会返回不同的来源，使用下面的请求我们得到你想要的：

In [26]: import requests

In [27]: from bs4 import BeautifulSoup

In [28]: r = requests.get("https://play.google.com/store/apps/details?id=com.zing.zalo", 
                         headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"})

In [29]: soup = BeautifulSoup(r.content)

In [30]: print(soup.select("span.rating-count"))
[<span aria-label="573,575 ratings" class="rating-count">573,575</span>]

如果我们在没有用户代理的情况下运行它：

In [31]: from bs4 import BeautifulSoup

In [32]: r = requests.get("https://play.google.com/store/apps/details?id=com.zing.zalo")

In [33]: soup = BeautifulSoup(r.content)

In [34]: print(soup.select("span.rating-count"))
[]

如果您从每个请求中打印出来源，您会发现它们非常不同。

Answer 3

感谢AKS和Padraic Cunningham。我做到了：）

我检查并发现：在使用＆＃34; User Agent＆＃34;之前，数据响应不包含＆＃34; rating-count＆＃34;。

（1）右：前（2）左：后

Beautifulsoup错误与类内容连字符“ - ”？

3 个答案: