使用Beautiful Soup在线网站计算div?

时间:2016-04-16 18:07:43

标签: python beautifulsoup

我正在尝试用类"项目milspec"来计算div的数量。在网站上。当我运行我的代码时,具有类"项目milspec"的div的数量。打印为0.为什么?

import urllib2
import lxml
from bs4 import BeautifulSoup

url = "http://g2case.com/en"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml")

milspecs = soup.findAll("div", {"class": "item milspec"})

print(len(milspecs))

1 个答案:

答案 0 :(得分:1)

JavaScript生成/修改的页面源。 urllib2只会为您提供服务器响应。您需要等待客户端代码完成,然后获取页面源。这可以使用selenium完成。

安装selenium

pip install selenium

然后尝试以下

from time import sleep

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://g2case.com/en'
browser = webdriver.Firefox()
browser.get(url)


def check_for_div_class_count(html, class_name):
    soup = BeautifulSoup(html, 'html.parser')

    milspecs = soup.findAll('div', {'class': class_name})
    return len(milspecs)

print check_for_div_class_count(browser.page_source, 'item milspec')
sleep(3)
print check_for_div_class_count(browser.page_source, 'item milspec')

browser.close()

修改

安装chromedriver

cd ~/Downloads
wget http://chromedriver.storage.googleapis.com/2.21/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
rm chromedriver_linux64.zip
chmod 777 chromedriver
sudo mkdir -p /opt/google/
sudo mv -f chromedriver /opt/google/
sudo ln -s /opt/google/chromedriver /usr/local/bin/