Question

如何从字符串中提取内容（how are you）：

<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">how are you</string>.

我可以使用正则表达式吗？如果可能的话，适合它的正则表达式。

注意：我不想使用split函数来提取结果。你也可以建议一些初学者学习正则表达式的链接。

我正在使用python2.7.2

Answer 1

你可以使用正则表达式（as Joey demonstrates）。

但是，如果你的XML文档比这个单行文件大，那么你就不能XML is not a regular language了。

改为使用BeautifulSoup（或another XML parser）：

>>> from BeautifulSoup import BeautifulSoup
>>> xml_as_str = '<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">how are you</string>. '
>>> soup = BeautifulSoup(xml_as_str)
>>> print soup.text
how are you.

或者...

>>> for string_tag in soup.findAll('string'):
...     print string_tag.text
... 
how are you

Answer 2

(?<=<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">)[^<]+(?=</string>)

会匹配你想要的东西，作为一个简单的例子。

(?<=<)[^<]+

也会。这完全取决于你的输入格式是如何准确的。

Answer 3

尝试使用以下正则表达式：

/<[^>]*>(.*?)</

Answer 4

这将匹配通用HTML标记（将“string”替换为您要匹配的标记）：

/<string[^<]*>(.*?)<\/string>/i

（i =不区分大小写）

使用正则表达式提取字符串

4 个答案:

改为使用BeautifulSoup（或another XML parser）：