Question

我正在尝试使用BS和Selenium抓取一个JavaScript启用页面。到目前为止，我有以下代码。它仍然不会以某种方式检测JavaScript（并返回一个空值）。在这种情况下，我试图在底部刮掉Facebook的评论。（Inspect元素将类显示为postText）
谢谢你的帮助！

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup.BeautifulSoup(html_source)  
comments = soup("div", {"class":"postText"})  
print comments

Answer 1

您的代码中存在一些错误，修复如下。但是，类“postText”必须存在于其他地方，因为它没有在原始源代码中定义。我修改后的代码版本已经过测试，可以在多个网站上使用。

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://techcrunch.com/2012/05/15/facebook-lightbox/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')  
#class "postText" is not defined in the source code
comments = soup.findAll('div',{'class':'postText'})  
print comments

使用Selenium和Beautiful Soup的Python Scraping JavaScript

1 个答案: