通过正则表达式,我想用相应的极性标签来注释给定句子的情感词典,所以我编写了如下代码行。
import re
vocab = ['good/POSI','bad/NEAG','strong/POSI','dirty/NEGA', 'never/SWIT']
sent = ["It is really good", "strong man never gets his body dirty"]
for token in vocab:
word = re.sub(r'(\\w+)\\/[A-Z]+_[A-Z]+','\\1', token)
TA = re.sub(str(word),str(token), str(sent))
print(TA)
我试着得到这样的结果。
["It is really good/POSI", "strong/POSI man never/SWIT gets his body dirty/NEGA"]
不幸的是,我不能,而且我不知道哪些线路有问题。 有没有更好的注释方法?
答案 0 :(得分:1)
我建议将vocab
列表更改为字典:
>>> vocab = {v[:v.find('/')]: v for v in vocab}
>>> vocab
{'dirty': 'dirty/NEGA', 'good': 'good/POSI', 'never': 'never/SWIT', 'bad': 'bad/NEAG', 'strong': 'strong/POSI'}
通过这种方式,您可以使用字典中的值替换\w+
:
result = []
for line in sent:
line = re.sub(r'(\w+)', lambda w: vocab.get(w.group(), w.group()), line)
result.append(line)
print(result)
这将输出您想要的内容:
['It is really good/POSI', 'strong/POSI man never/SWIT gets his body dirty/NEGA']