加载Javascript后获取HTML代码

时间:2017-01-29 05:14:24

标签: javascript python selenium web-scraping beautifulsoup

我正在试图抓住这个网站。我想得到主表。但问题是表通过Javascript加载。因此无法删除此表的HTML代码。这是代码。

from urllib.request import urlopen
from bs4 import BeautifulSoup
from selenium import webdriver 
import time

driver = webdriver.PhantomJS(executable_path='') 
driver.get("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=5&lang=en") 
time.sleep(3)
pageSource = driver.page_source
bsObj = BeautifulSoup(pageSource) 
print(bsObj.find(id="detailWPTable").get_text())

我想获取表格的内容。请帮忙!

1 个答案:

答案 0 :(得分:1)

您可以像dryscrape一样尝试:

from bs4 import BeautifulSoup as BS
import dryscrape

ses=dryscrape.Session()
ses.visit("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=1&lang=en")
soup = BS(ses.body(), 'lxml') # Parse page content 

print(soup.find(id="detailWPTable").get_text())

输出:

No.ColourHorseDrawWt.JockeyTrainerWinPlaceWin & Place1FURIOUS PEGASUS6132O MurphyT K Ng278.42HAPPY FIERY DRAGON5132N CallanD Cruz3.21.03HAPPY WAY WINNER12132K C NgK W Lui207.64EMPIRE OF MONGOLIA1128C Y HoC S Shum39105DYNAMIC VOYAGE4125K C LeungL Ho185.16OPTIMISM10124C SchofieldD E Ferraris124.37TREASURE AND GOLD13124J MoreiraC H Yip5.53.38MANHATTAN STRIKER3122O DoleuzeC Fownes124.39CHANS DELIGHT2121M ChadwickD Cruz176.510SHOW MISSION14121H W LaiY S Tsui278.311FRIENDS FOREVER7119K K ChiongK L Man9.73.512STARRY STARLIES11115H T MoP O'Sullivan146.013INTELLECTUAL GLIDE9113M L YeungA Lee146.114BERNARD'S CHOICE8113K TeetanT K Ng175.2F Field