我正在尝试用类"项目milspec"来计算div的数量。在网站上。当我运行我的代码时,具有类"项目milspec"的div的数量。打印为0.为什么?
import urllib2
import lxml
from bs4 import BeautifulSoup
url = "http://g2case.com/en"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml")
milspecs = soup.findAll("div", {"class": "item milspec"})
print(len(milspecs))
答案 0 :(得分:1)
由JavaScript
生成/修改的页面源。 urllib2
只会为您提供服务器响应。您需要等待客户端代码完成,然后获取页面源。这可以使用selenium
完成。
安装selenium
pip install selenium
然后尝试以下
from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'http://g2case.com/en'
browser = webdriver.Firefox()
browser.get(url)
def check_for_div_class_count(html, class_name):
soup = BeautifulSoup(html, 'html.parser')
milspecs = soup.findAll('div', {'class': class_name})
return len(milspecs)
print check_for_div_class_count(browser.page_source, 'item milspec')
sleep(3)
print check_for_div_class_count(browser.page_source, 'item milspec')
browser.close()
修改强>
安装chromedriver
cd ~/Downloads
wget http://chromedriver.storage.googleapis.com/2.21/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
rm chromedriver_linux64.zip
chmod 777 chromedriver
sudo mkdir -p /opt/google/
sudo mv -f chromedriver /opt/google/
sudo ln -s /opt/google/chromedriver /usr/local/bin/