我有一个列表,每次列表中的一个单词出现在文本中时,我都想替换下两个单词
例如:list = ['太太','小姐','女士','女士','先生','先生','阁下']
phrase ='对不起,那位女士在家里。
resultat ='对不起,那位女士
我正在尝试在数据框中执行此操作
我尝试过:
def words_contexte(df):
titres_list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']
data_frame_split = df['C'].str.split()
words_index = df['C'].str.data_frame_split[data_frame_split.index(titres_list) + 2]
df['C'] = df['C'].str.replace(words_index, '<next_words>')
return df
我的数据框:
A B C
French house Are you at home?
English house I'm sorry, but the lady is at home.
French apartment His name is Sir Ringo Starr.
French house I'm Mrs. Carla and I have a dog.
English apartment Hi Miss how are you?
好的输出
A B C
French house Are you at home?
English house I'm sorry, but the lady <next_words> home.
French apartment His name is Sir <next_words>.
French house I'm Mrs. <next_words> I have a dog.
English apartment Hi Miss <next_words> you?
答案 0 :(得分:1)
这是一种避免重复遍历每个列表的方法:
list_ = ['Mrs.', 'Miss', 'Ms.', 'lady', 'Mr.', 'Sir', 'Lord']
def fun(x, y):
in1d = np.in1d(x.split(' '), y)
in1d_drop = np.roll(in1d, 2)
in1d_replace = np.roll(in1d, 1)
l = np.where(in1d_drop, '', x.split(' '))
l = np.where(in1d_replace, '<next_words>', l)
return ' '.join(l)
只需将fun
应用于C
列的每一行:
df ['C'] = df['C'].apply(fun, y=list_)
print(df)
A B C
0 French House Are you at home?
1 English House I'm sorry, but the lady <next_words> home.
2 French Apartment His name is Sir <next_words>
3 French House I'm Mrs. <next_words> I have a dog
4 English Apartment Hi Miss <next_words> you?
答案 1 :(得分:0)
将apply
连接到一个函数将起作用,并拆分值,然后循环遍历拆分列表的枚举,检查短语的标题是否在l
中,如果是,则分配嵌套两个项目,然后在整个循环之后return
:
def f(x):
l = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']
l2=x.split()
for i,v in enumerate(l2):
if v.title() in l:
l2[i:i+3]=l[l.index(v.title()):l.index(v.title())+3]
break
return ' '.join(l2)
df['C']=df['C'].apply(f)
print(df)
输出:
A B C
0 French house Are you at home?
1 English house I'm sorry, but the Lady Mr. Sir home.
2 French apartment His name is Sir Lord
3 French house I'm Mrs. Miss Ms. I have a dog.
4 English apartment Hi Miss Ms. Lady you?
答案 2 :(得分:0)
您可以修改一些功能以按行应用它:
想法是获取每一行,将其拆分并遍历每个单词。 您检查它是否属于句子,获取索引,替换以下单词,然后替换位置+ 2的单词。
def replace_titre(row):
data_frame_split = row.split()
for titre in titres_list:
if titre in data_frame_split:
# Get the index of the word
words_index = list(data_frame_split).index(titre)
# Replace the +1 / following word
data_frame_split[words_index + 1] = "<next_words>"
# Delete the +2 word
del data_frame_split[words_index + 2]
return data_frame_split
然后您可以致电:
df = df.apply(replace_titre)