我可以通过以下代码将静态网站刮到csv:
import pandas as pd
url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'
for i, df in enumerate(pd.read_html(url)):
filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
df.to_csv(filename, encoding='UTF-8')
但是,我发现它不适用于动态网站。我怎样才能实现这个目标?
P.S。:我正在使用Python 3.6
答案 0 :(得分:0)
您可以使用selenium的webdriver
,它可以处理常规网络浏览器等网站。在您的示例中,在不更改代码的情况下应用selenium的最简单方法如下:
import pandas as pd
from selenium import webdriver
url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'
# The following lines are so the browser is headless, i.e. it doesn't open a window
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')
wd = webdriver.Chrome(chrome_options=options) # Open a browser using the options set
wd.get(url) # Open the desired url in the browser
for i, df in enumerate(pd.read_html(wd.page_source)): # Use wd.page_source to feed pd.read_html
filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
df.to_csv(filename, encoding='UTF-8')
wd.close() # Close the browser