我的文字如下:
"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s two surviving sons and..."
我想要一个输出如下:
PERSON Edward R. Kimmel
PERSON杰克
使用RegEX的想法吗?
非常感谢
答案 0 :(得分:2)
你尝试过beautifulsoup吗?
from bs4 import BeautifulSoup
txt = """<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s twosurviving sons and..."""
soup = BeautifulSoup(txt,"html.parser")
for i in soup.findAll(attrs={'type' : 'PERSON'}):
print(i.text)
答案 1 :(得分:0)
只需使用.findall
import re
x = '"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>"'
mac = []
mac = re.findall("TYPE=\"PERSON\">(.+?)<",x)
for i in mac:
print "PERSON "+i