连接列中的单词

时间:2019-06-10 14:41:22

标签: python string pandas dataframe nlp

背景

我有以下代码

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different', 
                               'i like a lot of sports ', 
                               'the middle east has many '], 
                   'After' : ['in the bright blue box', 
                               'because they go really fast ', 
                               'to ride and have fun '],

                  'P_ID': [1,2,3], 
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]

这将创建以下df

  P_ID  N_ID    Before                 Words       After
0   1   A1   there are many different   crayons     in the bright blue box
1   2   A2  i like a lot of sports      cars      because they go really fast
2   3   A3  the middle east has many    camels      to ride and have fun

目标

1)将BeforeAfter列中的单词与Word列中的单词连接

2)创建一个new_column

所需的输出

具有以下输出的new_column

new_column
there are many different crayons in the bright blue box
i like a lot of sports cars because they go really fast
the middle east has many camels to ride and have fun

问题

我如何实现目标?

3 个答案:

答案 0 :(得分:2)

您可以仅添加以下列:

df['new_column'] = df['Before'] + ' ' + df['Word'] + ' ' + df['After']

这是完整的代码:

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different', 
                               'i like a lot of sports ', 
                               'the middle east has many '], 
                   'After' : ['in the bright blue box', 
                               'because they go really fast ', 
                               'to ride and have fun '],

                  'P_ID': [1,2,3], 
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Word', 'Before', 'After']]
df['new_column'] = df['Before'] + ' ' + df['Word'] + ' ' + df['After']
df['new_column']
0    there are many different crayons in the bright...
1    i like a lot of sports  cars because they go r...
2    the middle east has many  camels to ride and h...
Name: new_column, dtype: object

答案 1 :(得分:1)

您可以按照上述建议添加列,也可以针对可能发生的许多类似问题添加更通用的解决方案

df['new_column']=df.apply(lambda x: x.Before+x.Word+x.After, axis=1)

答案 2 :(得分:1)

您可以使用.str访问器的cat()方法

df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
  • cat()甚至允许您添加分隔符
  • 加入多列只是传递一系列列表或包含除第一列之外的所有列的数据框作为在第一列(之前)上调用的str.cat()的参数的问题:

代码:

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different',
                               'i like a lot of sports ',
                               'the middle east has many '],
                   'After' : ['in the bright blue box',
                               'because they go really fast ',
                               'to ride and have fun '],

                  'P_ID': [1,2,3],
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]
print (df)
df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
print (df)