我需要基于组中的非null值将字符串应用于组。一个例子是:
ID name surname prsn_id
A john smith prsn_01
A john smith NaN
A john smith NaN
A john smith NaN
B mary jane prsn_02
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
B mary jane NaN
C Barry willis prsn_03
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
C Barry willis Nan
输出应为:
ID name surname prsn_id
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
A john smith prsn_01
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
B mary jane prsn_02
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
C Barry willis prsn_03
或:
ID name surname prsn_id prsn_id_2
A john smith prsn_01 NaN
A john smith NaN prsn_01
A john smith NaN prsn_01
A john smith NaN prsn_01
B mary jane prsn_02 NaN
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
B mary jane NaN prsn_02
C Barry willis prsn_03 NaN
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
C Barry willis Nan prsn_03
我尝试过:
df['prsn_id_2'] = (df
.groupby(['ID', 'name', 'surname'])['prsn_id']
.fillna(method='ffill'))
这可能会起作用,但是这会花费一些时间,因此以后将不太实用。我需要另一个矢量化且相对较快的解决方案。
答案 0 :(得分:2)
df1 = df.dropna(subset=['prsn_id'])
#if possible duplicates
#df1 = df.dropna(subset=['prsn_id']).drop_duplicates(['ID','name', 'surname'])
df = df.drop('prsn_id', axis=1).merge(df1, on=['ID','name', 'surname'], how='left')
print (df)
ID name surname prsn_id
0 A john smith prsn_01
1 A john smith prsn_01
2 A john smith prsn_01
3 A john smith prsn_01
4 B mary jane prsn_02
5 B mary jane prsn_02
6 B mary jane prsn_02
7 B mary jane prsn_02
8 B mary jane prsn_02
9 B mary jane prsn_02
10 B mary jane prsn_02
11 C Barry willis prsn_03
12 C Barry willis prsn_03
13 C Barry willis prsn_03
14 C Barry willis prsn_03
15 C Barry willis prsn_03
详细信息:
print (df1)
ID name surname prsn_id
0 A john smith prsn_01
4 B mary jane prsn_02
11 C Barry willis prsn_03