Question

我可以通过以下代码将静态网站刮到csv：

import pandas as pd
url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'
for i, df in enumerate(pd.read_html(url)):
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

但是，我发现它不适用于动态网站。我怎样才能实现这个目标？

P.S。：我正在使用Python 3.6

Answer 1

您可以使用selenium的webdriver，它可以处理常规网络浏览器等网站。在您的示例中，在不更改代码的情况下应用selenium的最简单方法如下：

import pandas as pd
from selenium import webdriver

url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'

# The following lines are so the browser is headless, i.e. it doesn't open a window
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')

wd = webdriver.Chrome(chrome_options=options)  # Open a browser using the options set

wd.get(url)  # Open the desired url in the browser
for i, df in enumerate(pd.read_html(wd.page_source)):  # Use wd.page_source to feed pd.read_html
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

wd.close()  # Close the browser

Python - 以简洁的表格格式将动态网站刮到csv

1 个答案: