我是一个文本文件,其中重要的短语用特殊符号表示。确切地说,它们将以<highlight>
开头,以<\highlight>
结尾。
例如,
"<highlight>machine learning<\highlight> is gaining more popularity, so do <highlight>block chain<\highlight>."
在这句话中,重要的短语按<highlight>
和<\highlight>
进行细分。
我需要移除<highlight>
和<\highlight>
,并用下划线替换连接它们所包围的单词的空格。即,将"<highlight>machine learning<\highlight>"
转换为"machine_learning"
。处理后的整个句子为"machine_learning is gaining more popularity, so do block_chain"
。
答案 0 :(得分:1)
试试这个:
>>> text = "<highlight>machine learning<\\highlight> is gaining more popularity, so do <highlight>block chain<\\highlight>."
>>> re.sub(r"<highlight>(.*?)<\\highlight>", lambda x: x.group(1).replace(" ", "_"), text)
'machine_learning is gaining more popularity, so do block_chain.'
答案 1 :(得分:-1)
你去了:
import re
txt = "<highlight>machine learning<\\highlight> is gaining more popularity, so do <highlight>block chain<\\highlight>."
words = re.findall('<highlight>(.*?)<\\\highlight', txt)
for w in words:
txt = txt.replace(w, w.replace(' ', '_'))
txt = txt.replace('<highlight>', '')
txt = txt.replace('<\highlight>', '')
print(txt)