我使用斯坦福NER标记文件,我想要更换每个" O"标记为" NONE"。我已经尝试过此代码,但显示输出错误。问题是它取代了每一个" O"在字符串中。我不熟悉正则表达式,也不知道什么是我的问题的正确正则表达式。 TIA。
这是我的代码:
import re
tagged_text = st.tag(per_word(input_file))
string_type = "\n".join(" ".join(line) for line in tagged_text)
for line in string_type:
output_file.write (re.sub('O$', 'NONE', line))
示例输入:
Tropical O
Storm O
Jolina O
affects O
2,000 O
people O
MANILA LOCATION
, O
Philippines LOCATION
– O
Initial O
reports O
from O
the O
输出:
Tropical NONE
Storm NONE
Jolina NONE
affects NONE
2,000 NONE
people NONE
MANILA LNONECATINONEN
, NONE
Philippines LNONECATINONEN
– NONE
Initial NONE
reports NONE
from NONE
the NONE
答案 0 :(得分:1)
您不需要遍历string_type
,直接在字符串上使用re.sub
应该有效:
s = """Tropical O
Storm O
Jolina O
affects O
2,000 O
people O
MANILA LOCATION
, O
Philippines LOCATION
– O
Initial O
reports O
from O
the O"""
import re
print(re.sub(r"\bO(?=\n|$)", "NONE", s))
给出:
Tropical NONE
Storm NONE
Jolina NONE
affects NONE
2,000 NONE
people NONE
MANILA LOCATION
, NONE
Philippines LOCATION
– NONE
Initial NONE
reports NONE
from NONE
the NONE
此处\bO(?=\n|$)
匹配单个字母O
,后跟新行字符\n
或行尾$
。