上下文中的单词-熊猫

时间:2018-12-07 10:32:52

标签: python pandas dataframe

我有一个列表,每次列表中的一个单词出现在文本中时,我都想替换下两个单词

例如:list = ['太太','小姐','女士','女士','先生','先生','阁下']

phrase ='对不起,那位女士在家里。

resultat ='对不起,那位女士回家。'

我正在尝试在数据框中执行此操作

我尝试过:

def words_contexte(df):

    titres_list = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']

    data_frame_split = df['C'].str.split()
    words_index = df['C'].str.data_frame_split[data_frame_split.index(titres_list) + 2]
    df['C'] = df['C'].str.replace(words_index, '<next_words>')

    return df

我的数据框:

       A          B                                     C
  French      house                      Are you at home?
 English      house   I'm sorry, but the lady is at home.
  French  apartment          His name is Sir Ringo Starr.
  French      house      I'm Mrs. Carla and I have a dog.
 English  apartment                  Hi Miss how are you?

好的输出

       A          B                                     C
  French      house                      Are you at home?
 English      house   I'm sorry, but the lady <next_words> home.
  French  apartment          His name is Sir <next_words>.
  French      house      I'm Mrs. <next_words> I have a dog.
 English  apartment                  Hi Miss <next_words> you?

3 个答案:

答案 0 :(得分:1)

这是一种避免重复遍历每个列表的方法:

list_ = ['Mrs.', 'Miss', 'Ms.', 'lady', 'Mr.', 'Sir', 'Lord']

def fun(x, y):
    in1d = np.in1d(x.split(' '), y)
    in1d_drop = np.roll(in1d, 2)
    in1d_replace = np.roll(in1d, 1)
    l = np.where(in1d_drop, '', x.split(' '))
    l = np.where(in1d_replace, '<next_words>', l)
    return ' '.join(l)

只需将fun应用于C列的每一行:

df ['C'] = df['C'].apply(fun, y=list_)

print(df)
      A          B                                            C
0   French      House                             Are you at home?
1  English      House  I'm sorry, but the lady <next_words>  home.
2   French  Apartment                His name is Sir <next_words> 
3   French      House          I'm Mrs. <next_words>  I have a dog
4  English  Apartment                   Hi Miss <next_words>  you?

答案 1 :(得分:0)

apply连接到一个函数将起作用,并拆分值,然后循环遍历拆分列表的枚举,检查短语的标题是否在l中,如果是,则分配嵌套两个项目,然后在整个循环之后return

def f(x):
   l = ['Mrs.', 'Miss', 'Ms.', 'Lady', 'Mr.', 'Sir', 'Lord']
   l2=x.split()
   for i,v in enumerate(l2):
      if v.title() in l:
         l2[i:i+3]=l[l.index(v.title()):l.index(v.title())+3]
         break
   return ' '.join(l2)


df['C']=df['C'].apply(f)
print(df)

输出:

         A          B                                      C
0   French      house                       Are you at home?
1  English      house  I'm sorry, but the Lady Mr. Sir home.
2   French  apartment                   His name is Sir Lord
3   French      house        I'm Mrs. Miss Ms. I have a dog.
4  English  apartment                  Hi Miss Ms. Lady you?

答案 2 :(得分:0)

您可以修改一些功能以按行应用它:

想法是获取每一行,将其拆分并遍历每个单词。 您检查它是否属于句子,获取索引,替换以下单词,然后替换位置+ 2的单词。

def replace_titre(row):
    data_frame_split = row.split()
    for titre in titres_list:
        if titre in data_frame_split:
            # Get the index of the word
            words_index = list(data_frame_split).index(titre)

            # Replace the +1 / following word
            data_frame_split[words_index + 1] = "<next_words>"

            # Delete the +2 word
            del data_frame_split[words_index + 2]
    return data_frame_split

然后您可以致电:

df = df.apply(replace_titre)