Question

我有这样的文字：

text='gn="right" headers="gr-Y10 gr-eps i36">121.11<\\/td><\\/tr><tr class="hr"><td colspan="12"><\\/td><\\/tr><tr>'

我希望使用正则表达式获取值121.11。所以我这样做了：

import re
b=re.search('gr-Y10 gr-eps i36">(.*)<\\\\/td', text)
b.group(1)

我把它作为输出：

'121.11<\\/td><\\/tr><tr class="hr"><td colspan="12">'

我怎样才能得到我真正想要的东西，121.11而不是上面的那一行？

Answer 1

gr-Y10 gr-eps i36">(.*?)<\\\\/td

                      ^^

通过附加*使你的?非贪婪。通过使其非贪婪，它将在<\\\\/td的第一个实例停止，否则它将捕获到最后<\\\\/td。

参见演示。

https://regex101.com/r/iS6jF6/2#python

Answer 2

了解source of the input data并考虑到它是HTML，这是一个涉及 HTML Parser 的解决方案，BeautifulSoup：

soup = BeautifulSoup(input_data)

for row in soup.select('div#tab-growth table tr'):
    for td in row.find_all('td', headers=re.compile(r'gr-eps')):
        print td.text

基本上，对于“增长”表中的每一行，我们都会在标题中找到gr-eps的单元格（表格的“EPS％”部分）。它打印：

60.00
—
—
—
—
42.22
3.13
—
—
—
-498.46
...

This is a good read也。

Python：在第一次出现两个子串之间找到一个字符串

2 个答案: