Question

我使用line.rfind来解析特定的html代码行。例如，这是我解析的html代码行：

<strong class="temp">79<span>&deg;</span></strong><span class="low"><span>Lo</span> 56<span>&deg;</span></span>

这是我用来分割线的代码（在这种情况下）拉出'79'。

position0 = line.rfind('{}'.format(date1.strftime("%a")))
if position0 > 0 :
        self.high0 = lines[line_number + 4].split('<span>')[0].split('">')[-1]

现在我只需要提取该数字，如果它是＆gt; = 94且＆lt; = 37。如果它不符合这个标准，我不希望发生任何事情。有任何想法吗？提前谢谢！

Answer 1

我认为我会使用正则表达式来获得高温。如果我正在解析一个冗长的Html文档，那么或者也许是beautifulsoup。以下内容应该从重复OP中列出的模式的字符串中获取所有高温。

import re

s = '<strong class="temp">79<span>&deg;</span></strong><span class="low"><span>Lo</span> 56<span>&deg;</span></span>'
p = re.compile(r'>(?P<high>\d+)<span>\&deg')
matches = p.finditer(s)
for match in matches:
    print match.group('high')

Answer 2

我能够通过以下方式实现这一目标：

if int(c.high0) >= 34:
            plt.text(x, y, int(c.high0), fontsize=7, fontweight='bold')

解析html行并仅提取符合特定条件的数字

2 个答案: