Python BeautifulSoup如何提取/查找

时间:2017-01-27 13:56:59

标签: beautifulsoup

任何人都可以帮我解释为什么我不能在这里工作吗?

我不太了解BeautifulSoup的文档。

req = Request('http://performance.morningstar.com/stock/performance-return.action?p=dividend_split_page&t=D05, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

soup = bs4.BeautifulSoup(webpage, 'lxml')

div = soup.find('div', {'id': 'div_annual_dividends'})

th = div.find('th', text="Dividend Amount")

似乎无法使用0.56

提取值nextSibling.text

这是我收到的错误。

AttributeError: 'NoneType' object has no attribute 'nextSibling'

如何将输出存储到数组?

for tr in soup('th', text="Dividend Amount"):
row = [td.text for td in tr('td')]
print(row)

这是正确的吗?

1 个答案:

答案 0 :(得分:0)

此页面由JavaScript呈现,真实数据位于此网址:

http://performance.morningstar.com/perform/Performance/stock/annual-dividends.action?&t=XSES:D05&region=sgp&culture=en-US&cur=&ops=clear&ndec=2&y=5

enter image description here

你可以在chrome Dev-Tools中找到这个网址。

代码:

import requests, bs4

r = requests.get('http://performance.morningstar.com/perform/Performance/stock/annual-dividends.action?&t=XSES:D05&region=sgp&culture=en-US&cur=&ops=clear&ndec=2&y=5')
soup = bs4.BeautifulSoup(r.text, 'lxml')
rows = []
for tr in soup('tr', class_=False):
    row = [td.text for td in tr('td')]
    rows.append(row)

出:

[['0.56', '0.56', '0.58', '0.60', '0.60'],
 ['3.77', '3.27', '2.82', '3.59', '3.46']]