用dividend.com和python刮取股息

时间:2017-09-29 17:26:32

标签: python web-scraping

我正试图从dividend.com中获取股息,这是我的剧本:

import requests

url = 'http://www.dividend.com/ex-dividend-dates.php?from_filter=yes&ex_div_date_min=2018-01-11&ex_div_date_max=2018-01-11&common_shares=on&preferred_shares=on&adrs=on&etns=on&funds=on&notes=on&etfs=on&reits=on'

page = requests.get(url)
page_content = page.content

with open('page_content.txt', 'w') as f:
    f.write(str(page_content))
    f.close()

我将结果保存到文本文件中,我对上传图片中的块感兴趣,

html block

在文件中,这个块有很多块,我想将这些块保存到字典列表中,每个字典应该包含一个块的数据,并且应该如下所示,

{ '股票符号':'MFO', '公司名称':'MFA Financial Inc. 8%到期20%的优惠票据', 'DARS™评级':'', 'Ex-Div Date':'2018-01-11', '支付日期':'2018-01-16', 'Div Payout':'0.50', '合格股息?':'不', '股价':'$ 26.04', '收益率':'7.68%', }

请提供帮助,并提前致谢。

1 个答案:

答案 0 :(得分:1)

由于该表的数据是动态生成的,因此您必须使用selenium进行gatecrash以获取所需内容。这是一个结合了BeautifulSoup和selenium的脚本,用于达到目的:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://www.dividend.com/ex-dividend-dates.php?from_filter=yes&ex_div_date_min=2018-01-11&ex_div_date_max=2018-01-11&common_shares=on&preferred_shares=on&adrs=on&etns=on&funds=on&notes=on&etfs=on&reits=on")
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
table = soup.select("table#ex-dividend-dates")[0]
list_row =[[tab_d.text.strip().replace("\n","") for tab_d in item.select('th,td')]
            for item in table.select('tr')]

for data in list_row[:2]:
    print(' '.join(data))

结果如下:

Stock Symbol Company Name DARS™ Rating Ex-Div Date Pay Date Div Payout Qualified Dividend? Stock Price Yield
MFO MFA Financial Inc. 8% Sr. Notes due 2042  2018-01-11 2018-01-16 0.50 No $26.04 7.68%