我有一个数据框,其中一行有一个类似结构的列表
import pandas as pd
df=pd.DataFrame({'Name':['Stooge, Nick','Dick, Tracy','Rick, Nike','Maw','El','Paw, Maw, Haw','Caw', 'Greep'],
'key':[2,2,2,1,1,3,1,1,],
'Lastname':['Smith, Foo','Johnson, Macy','Johnson, Sike','Simpson','Diablo','Simpson, Sampson, Simmons','Simpson', 'Mortimer']
})
df.ix[df['key'] == 2, 'Full'] = df['Name']+', ' + df['Lastname']
df.ix[df['key'] == 1, 'Full'] = df['Name']+' ' + df['Lastname']
print(df)
输出:
Lastname Name key Full
0 Smith, Foo Stooge, Nick 2 Stooge, Nick, Smith, Foo
1 Johnson, Macy Dick, Tracy 2 Dick, Tracy, Johnson, Macy
2 Johnson, Sike Rick, Nike 2 Rick, Nike, Johnson, Sike
3 Simpson Maw 1 Maw Simpson
4 Diablo El 1 El Diablo
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3 NaN
6 Simpson Caw 1 Caw Simpson
7 Mortimer Greep 1 Greep Mortimer
是否有一种方法可以通过逗号操作或拆分数据框内的字符串,从而产生如下结果:
Lastname Name key Full
0 Smith, Foo Stooge, Nick 2 Stooge Smith and Nick Foo
1 Johnson, Macy Dick, Tracy 2 Dick Johnson and Tracy Macy
2 Johnson, Sike Rick, Nike 2 Rick Johnson and Nike Sike
3 Simpson Maw 1 Maw Simpson
4 Diablo El 1 El Diablo
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3 NaN
6 Simpson Caw 1 Caw Simpson
7 Mortimer Greep 1 Greep Mortimer
答案 0 :(得分:2)
ln = df.Lastname.str.split(r',\s*', expand=True).stack()
fn = df.Name.str.split(r',\s*', expand=True).stack()
df['full'] = fn.add(' ').add(ln).groupby(level=0).apply(tuple).str.join(' and ')
df
答案 1 :(得分:0)
您可以使用apply():
In [63]: df
Out[63]:
Lastname Name key Full
0 Smith, Foo Stooge, Nick 2 Stooge, Nick, Smith, Foo
1 Johnson, Macy Dick, Tracy 2 Dick, Tracy, Johnson, Macy
2 Johnson, Sike Rick, Nike 2 Rick, Nike, Johnson, Sike
3 Simpson Maw 1 Maw Simpson
4 Diablo El 1 El Diablo
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3 NaN
6 Simpson Caw 1 Caw Simpson
7 Mortimer Greep 1 Greep Mortimer
In [64]: def get_full_name(row):
...: if ',' in str(row.Full):
...: z = row.Full.split(',')
...: x = z[::2]
...: y = z[1::2]
...: return ' and '.join(map(lambda(first, last): ' '.join([first, last]), zip(z, y)))
...: return row.Full
...:
In [65]: df['Full'] = df.apply(get_full_name, axis = 1)
In [66]: df
Out[66]:
Lastname Name key Full
0 Smith, Foo Stooge, Nick 2 Stooge Nick and Nick Foo
1 Johnson, Macy Dick, Tracy 2 Dick Tracy and Tracy Macy
2 Johnson, Sike Rick, Nike 2 Rick Nike and Nike Sike
3 Simpson Maw 1 Maw Simpson
4 Diablo El 1 El Diablo
5 Simpson, Sampson, Simmons Paw, Maw, Haw 3 NaN
6 Simpson Caw 1 Caw Simpson
7 Mortimer Greep 1 Greep Mortimer