BeautifulSoup找不到元素

时间:2019-05-14 14:20:52

标签: html web-scraping beautifulsoup

我开始使用BeautifulSoup,不幸的是,它无法按预期工作。

在以下链接中,https://www.globes.co.il/news/article.aspx?did=1001285059包含以下元素:

<div class="sppre_message-data-wrapper">... </div>

我试图通过编写以下代码来获取此元素:

html = urlopen("https://www.globes.co.il/news/article.aspx?did=1001285059")
bsObj = BeautifulSoup(html.read(), features="html.parser")
comments = bsObj.find_all('div', {'class': ["sppre_message-data-wrapper"]})
print(comments)

“评论”给出了一个空数组

1 个答案:

答案 0 :(得分:2)

它位于iframe中。向iframe src发出请求

https://spoxy-shard2.spot.im/v2/spot/sp_8BE2orzs/post/1001285059/?elementId=6a97624752c75d958352037d2b36df77&spot_im_platform=desktop&host_url=https%3A%2F%2Fwww.globes.co.il%2Fnews%2Farticle.aspx%3Fdid%3D1001285059&host_url_64=aHR0cHM6Ly93d3cuZ2xvYmVzLmNvLmlsL25ld3MvYXJ0aWNsZS5hc3B4P2RpZD0xMDAxMjg1MDU5&pageSize=1&count=1&spot_im_ph__prerender_deferred=true&prerenderDeferred=true&sort_by=newest&conversationSkin=light&isStarsRatingEnabled=false&enableMessageShare=true&enableAnonymize=true&isConversationLiveBlog=false&enableSeeMoreButton=true

py

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://spoxy-shard2.spot.im/v2/spot/sp_8BE2orzs/post/1001285059/?elementId=6a97624752c75d958352037d2b36df77&spot_im_platform=desktop&host_url=https%3A%2F%2Fwww.globes.co.il%2Fnews%2Farticle.aspx%3Fdid%3D1001285059&host_url_64=aHR0cHM6Ly93d3cuZ2xvYmVzLmNvLmlsL25ld3MvYXJ0aWNsZS5hc3B4P2RpZD0xMDAxMjg1MDU5&pageSize=1&count=1&spot_im_ph__prerender_deferred=true&prerenderDeferred=true&sort_by=newest&conversationSkin=light&isStarsRatingEnabled=false&enableMessageShare=true&enableAnonymize=true&isConversationLiveBlog=false&enableSeeMoreButton=true')
soup= bs(r.content,'html.parser')
comments = [item.text for item in soup.select('.sppre_message-data-wrapper')]
print(comments)

BeautifulSoup不支持深度组合器(无论如何我现在已经淘汰了它),但是您可以在浏览器(Chrome)中使用以下命令查看此信息:

*/deep/.sppre_message-data-wrapper

最终不会有问题,因为原始URL的请求响应中没有内容。

我可以选择使用硒,然后切换到iframe。尽管id为401bccf8039377de3e9873905037a855-iframe,即find_element_by_css_selector的#401bccf8039377de3e9873905037a855-iframe,然后切换到,但更健壮(如果ID为动态)的选择器将为.sppre_frame-container iframe