Question

我有一个像这样的HTML文件

<html>
<head>
<title>Threshold Limit Exceeded</title>
</head>
<body>
<h1> Thereshold Limit Exceeded</h1>
Below is Exceeded Count<br/><br/>

<pre>        <td id="a95" bgcolor=#FDFAF9>Service-Count-New</td>^M
    ^M
        <td id="b95" align="center"  bgcolor=#FDFAF9>3023</td>^M
</pre>

<br/>mail me at <a    href='mailto:mail@abc.com'>mail@abc.com</a>.<br>
</body>
</html>

我写了下面的代码来获取HTML中提到的Count

f = open('q.txt', "r")
for line in f:
    if "Service-Count-New" in line:
      line1 = line
      line2 = f.next()
      line3 = f.next()
      f.close
      a = line3
      b = 500
      if b < a:
        print a
    import htmlbodymailerrormsg

当我执行上面的代码时if条件不起作用，含义如果HTML中提到的值（即3023）每次都打印，即使它低于500。但是，如果我在ipython上尝试它，它可以正常工作，但不能用于脚本。

Answer 1

因为a是整行，即：

a = '<td id="b95" align="center"  bgcolor=#FDFAF9>3023</td>'

首先，您必须从此字符串中提取3023（可能使用Regexp）。然后当你有字符串3023时，你必须在if语句之前将它转换为整数。

改进的提示：如果要解析html，请查看BeautifulSoup instaed。使用它，您只需选择带有id=b95的elemtn，然后获取内容。

Answer 2

beautifulsoup将对您的问题有很大的帮助。

from BeautifulSoup import BeautifulSoup
htmlData = htmlFile.read()
parsed_html = BeautifulSoup(htmlData)
print parsed_html.body.find('a', attrs={'td':'b95'}).text

如果你有大量的html数据，请使用with文件打开。

如果条件超出文件阈值，则条件超出值

2 个答案: