Question

text='<tag1>one</tag1>this should be displayed<tag2>two</tag2>this too<tag3>three</tag3>and this<tag4>four</tag4>'

使用python来考虑这个字符串我要打印：

this should be displayed
this too
and this

不是

one,two,three,four

我尝试了以下代码：

import re

text='<>one</>this should be displayed<>two</>this too<>three</>and this<>four</>'
start=0
m=re.findall('>(.+?)<',text)

print m

但是我得到了所有的字符串：

['one', 'this should be displayed', 'two', 'this too', 'three', 'and this', 'four']

Answer 1

您几乎拥有了它，只需要一个/，请注意，您只希望/>和<和>之间的单词：< / p>

更改此：

对此：

m=re.findall('>(.+?)<',text)

因此：

m=re.findall('/>(.+?)<',text)

输出：

import re

text='<>one</>this should be displayed<>two</>this too<>three</>and this<>four</>'
print(re.findall('/>(.+?)<',text))

编辑：

使用BeautifulSoup：

['this should be displayed', 'this too', 'and this']

输出：

from bs4 import BeautifulSoup
import bs4

text='<tag1>one</tag1>this should be displayed<tag2>two</tag2>this too<tag3>three</tag3>and this<tag4>four</tag4>'
soup = BeautifulSoup(text, 'html.parser')
for elem in soup:
    if type(elem) is bs4.element.NavigableString:  # only if the elem is not of a tag type
       print(elem)

Answer 2

需要在比赛的第一部分添加正斜杠，我也将使用([^<]+?) –不过，除非输入格式不正确，否则我认为这只是语义。

m=re.findall('\/>([^<]+?)<',text)

您刚刚更改了问题，所以这是一个新的答案，用于在标签外查找文本：

m=re.findall('</.+?>([^<]+?)<.+?>',text)

获取xml中不同标签之间的字符串

2 个答案: