Question

假设我有一个包含以下数据的文件：

<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>'
<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60500 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>'

在上面这个字符串中，对于HTML标签下的两行和下一行写入的后续字符串，如何在title =“[Title] @Blue：Session_TIMEOUT 60033 ms后”后检索字符串。

我想要这样的输出：

<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>'
&#64;Blue: Session_TIMEOUT after 60033 ms
<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60500 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>'
&#64;Blue: Session_TIMEOUT after 60500 ms

请帮我一样...... 提前致谢

Answer 1

您可以使用正则表达式。如果您可以告诉您的intereset字符串总是固定在title="和结尾ms之间，那么您可以这样做：

import re＃regulare expressions module g = re.compile（'title =“（。*？ms）'）。search（line）#search for your string

然后您的字符串将通过g.group(1)提供。你可能会发现它有用于阅读python文档中的正则表达式，它是一种非常重要的编程工具，适用于所有语言，特别是在脚本编写方面。

您可能还想在问题中添加regex标记。

Answer 2

使用Beautiful Soup库，您可以非常轻松地完成这项工作：

from BeautifulSoup import BeautifulSoup
myHTML = '<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;BlueScreen: RCU_PCPU_TIMEOUT after 60033 ms</a></td>'
html_doc = BeautifulSoup( myHTML )
print html_doc.td.a.string

如果您使用的是基于debian的操作系统，可以使用pip或easy_install或apt-get安装

美丽的汤想要：

pip install BeautifulSoup
easy_install BeautifulSoup
apt-get install python-beautifulsoup

Answer 3

一个简单的方法：

line = line[(line.index('[Title]')+len('[Title]')):]
line = line[(line.index('[Title]')+len('[Title]')):]
text = line[:line.index('</a></td>')]
print line + '\n' + text

虽然，更好的方法是使用CodeChordsman提到的正则表达式

使用python打印特定单词后的所有单词

3 个答案: