我有问题如何使用我的函数替换句子中0以外的其他位置的字符串。
我想将col2中的字符串替换为col1中的字符串(总是小写)
例如,我想尝试替换:
input: Hello Aaa1 my very good friend
output: Hello NNP my very good friend
现在我只有:
input: Aaa1 my very good friend
output: NNP my very good friend
我想在句子的所有位置替换字符串。
我试试:
## import libraries
from nltk import word_tokenize, pos_tag, pos_tag_sents
## tag the sentece
df['col2'] = df['col2'].apply(word_tokenize).apply(pos_tag)
## this function does the magic
def get_vals(lst):
op = []
for i, v in enumerate(lst):
if i == 0:
op.append(v[1])
else:
op.append(v[0])
return ' '.join(op)
## apply the function
df['col2'] = df['col2'].apply(get_vals)
print(df)
col1 col2
0 aaa1 NNP is a great friend
1 abb2 NN is a very good friend
编辑:
我有:
col1 col2 output
aaa1 AAA1 Hello hello NNP Hello hello
aaa2 aaa2 hello hello NN hello hello
aaa3 Hello AAa3 hello Hello NNP hello
我想在每一行中替换特定的POS标签(不仅仅是NNP的一个字符串)
答案 0 :(得分:0)
使用re
import re
inp = "Hello Aaa1 my very good friend"
output = "Hello NNP my very good friend"
re.sub("Aaa1", "NNP", output)
使用pandas
import pandas as pd
df = pd.DataFrame(data={"col": [inp]})
df["col"].str.replace("Aaa1", "NNP")