以下代码提取网页信息
from BeautifulSoup import BeautifulSoup
import requests
import urllib2
url = 'http://www.surfline.com/surf-report/rincon-southern-california_4197/'
source_code = requests.get(url)
plain_text = source_code.text
print plain_text
site = urllib2.urlopen(url).read()
print site
两个库结果包括:
<div id="current-surf-range" style="font-size:21px;font-weight:bold;padding-top:7px; padding-bottom: 7px;"></div>
不幸的是,这与实际网页不同:
<div id="current-surf-range" style="font-size:21px;font-weight:bold;padding-top:7px; padding-bottom: 7px;">4-5ft</div>
4-5英尺不存在,因此无法通过BeautifulSoup提取。
答案 0 :(得分:1)
selenium
完整说明
醇>
pip3安装selenium
from selenium import webdriver url = 'http://www.surfline.com/surf-report/rincon-southern-california_4197/' web = webdriver.Firefox() # web = webdriver.Remote('http://localhost:9515', desired_capabilities=DesiredCapabilities.CHROME) source_code = web.get(url) # Sometimes it take time to load the page that's why: from time import sleep; sleep(2) plain_text = source_code.page_source