如何使用另一列和POS标签替换句子中的字符串?
我想将col2中的字符串替换为col1
中的POS标记例如:
col1 col2 output
mtmb2 MTMB2 is a my sentence NNP is a my sentence
mmm2 Your MmM2 is my sentence Your NNP is my sentence
bbb2 Your sentence is bbb2 Your sentence is NN
我尝试使用@YOLO解决方案:
## import libraries
from nltk import word_tokenize, pos_tag, pos_tag_sents
## tag the sentece
df['col2'] = df['col2'].apply(word_tokenize).apply(pos_tag)
## this function does the magic
def get_vals(lst):
op = []
for i, v in enumerate(lst):
if i == 0:
op.append(v[1])
else:
op.append(v[0])
return ' '.join(op)
## apply the function
df['col2'] = df['col2'].apply(get_vals)
print(df)
col1 col2
0 aaa1 NNP is a great friend
1 abb2 NN is a very good friend
但是这个解决方案只有当要替换的单词在第一个索引上才有效...