在pandas数据帧中拆分字符串

时间:2016-09-10 08:24:39

标签: python pandas dataframe

我有一个数据框,其中一行有一个类似结构的列表

import pandas as pd

df=pd.DataFrame({'Name':['Stooge, Nick','Dick, Tracy','Rick, Nike','Maw','El','Paw, Maw, Haw','Caw', 'Greep'],
'key':[2,2,2,1,1,3,1,1,],
'Lastname':['Smith, Foo','Johnson, Macy','Johnson, Sike','Simpson','Diablo','Simpson, Sampson, Simmons','Simpson', 'Mortimer']
})


df.ix[df['key'] == 2, 'Full'] =  df['Name']+', ' + df['Lastname']
df.ix[df['key'] == 1, 'Full'] = df['Name']+' ' + df['Lastname']
print(df)

输出:

                    Lastname           Name  key                        Full
0                 Smith, Foo   Stooge, Nick    2    Stooge, Nick, Smith, Foo
1              Johnson, Macy    Dick, Tracy    2  Dick, Tracy, Johnson, Macy
2              Johnson, Sike     Rick, Nike    2   Rick, Nike, Johnson, Sike
3                    Simpson            Maw    1                 Maw Simpson
4                     Diablo             El    1                   El Diablo
5  Simpson, Sampson, Simmons  Paw, Maw, Haw    3                         NaN
6                    Simpson            Caw    1                 Caw Simpson
7                   Mortimer          Greep    1              Greep Mortimer

是否有一种方法可以通过逗号操作或拆分数据框内的字符串,从而产生如下结果:

                    Lastname           Name  key                        Full
0                 Smith, Foo   Stooge, Nick    2    Stooge Smith and Nick Foo
1              Johnson, Macy    Dick, Tracy    2  Dick Johnson and Tracy Macy
2              Johnson, Sike     Rick, Nike    2   Rick Johnson and Nike Sike
3                    Simpson            Maw    1                 Maw Simpson
4                     Diablo             El    1                   El Diablo
5  Simpson, Sampson, Simmons  Paw, Maw, Haw    3                         NaN
6                    Simpson            Caw    1                 Caw Simpson
7                   Mortimer          Greep    1              Greep Mortimer

2 个答案:

答案 0 :(得分:2)

ln = df.Lastname.str.split(r',\s*', expand=True).stack()
fn = df.Name.str.split(r',\s*', expand=True).stack()
df['full'] = fn.add(' ').add(ln).groupby(level=0).apply(tuple).str.join(' and ')
df

enter image description here

答案 1 :(得分:0)

您可以使用apply()

In [63]: df
Out[63]: 
                    Lastname           Name  key                        Full
0                 Smith, Foo   Stooge, Nick    2    Stooge, Nick, Smith, Foo
1              Johnson, Macy    Dick, Tracy    2  Dick, Tracy, Johnson, Macy
2              Johnson, Sike     Rick, Nike    2   Rick, Nike, Johnson, Sike
3                    Simpson            Maw    1                 Maw Simpson
4                     Diablo             El    1                   El Diablo
5  Simpson, Sampson, Simmons  Paw, Maw, Haw    3                         NaN
6                    Simpson            Caw    1                 Caw Simpson
7                   Mortimer          Greep    1              Greep Mortimer

In [64]: def get_full_name(row):
    ...:     if ',' in str(row.Full):
    ...:        z = row.Full.split(',')
    ...:        x = z[::2]
    ...:        y = z[1::2]
    ...:        return ' and '.join(map(lambda(first, last): ' '.join([first, last]), zip(z, y)))
    ...:     return row.Full
    ...: 

In [65]: df['Full'] = df.apply(get_full_name, axis = 1)

In [66]: df
Out[66]: 
                    Lastname           Name  key                          Full
0                 Smith, Foo   Stooge, Nick    2   Stooge  Nick and  Nick  Foo
1              Johnson, Macy    Dick, Tracy    2  Dick  Tracy and  Tracy  Macy
2              Johnson, Sike     Rick, Nike    2    Rick  Nike and  Nike  Sike
3                    Simpson            Maw    1                   Maw Simpson
4                     Diablo             El    1                     El Diablo
5  Simpson, Sampson, Simmons  Paw, Maw, Haw    3                           NaN
6                    Simpson            Caw    1                   Caw Simpson
7                   Mortimer          Greep    1                Greep Mortimer