Question

我正在尝试使用BeautifulSoup和RE从雅虎财经中获取特定价值。我无法弄清楚如何得到它。我会粘贴一些代码以及我得到的HTML和唯一选择器。

我只想在这里使用这个数字，“7.58”，但问题是该列的类与同一元素中的许多其他类相同。

<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><td class="yfnc_tabledata1">7.58</td>"

以下是谷歌给我的选择...

yfncsumtab＆gt; tbody＆gt; tr：nth-child（2）＆gt; td.yfnc_modtitlew1＆gt;表：nth-child（10）＆gt; tbody＆gt; tr> td＆gt;表＆gt; tbody＆gt; tr：nth-child（8）＆gt; td.yfnc_tabledata1

这是我用来测试不同内容的一些模板代码，但我对正则表达式很新，并且在“稀释的EPS（ttm）：###

from bs4 import BeautifulSoup
import requests
import re


sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

body = soup.findAll('td')


print (body)

谢谢！

Answer 1

您可以先找到文字Diluted EPS (ttm):：

soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})

Answer 2

如果使用正则表达式，请尝试：

>>> import re
>>> text = '<tr><td class="yfnc_tablehead1" width="74%">Diluted EPS (ttm):</td><
td class="yfnc_tabledata1">7.58</td>"'
>>> re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', text)
['7.58']

更新以下是使用requests和re的示例代码：

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text)

输出：

[u'7.58']

Answer 3

感谢您回答我的问题。我能够使用两种方法来获得所需的值。第一种方式就是这样。

from bs4 import BeautifulSoup
import requests

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')

soup = BeautifulSoup(res.text, 'html.parser')

eps = soup.find('td', text='Diluted EPS (ttm):').parent.find('td', attrs={'class': 'yfnc_tabledata1'})

for i in eps:
    print (i)

这是第二种方式......

import requests
import re

sess = requests.Session()
res = sess.get('http://finance.yahoo.com/q/ks?s=MMM+Key+Statistics')
print (re.findall('Diluted\s+EPS\s+\(ttm\).*?>([\d.]+)<', res.text.strip()))

我还没有完全理解这一切，但这是一个很好的开始，有两种不同的方式来理解它并继续推进项目的这个方面。非常感谢您的协助！

使用Python的正则表达式来获取HTML5中的值

3 个答案: