我有一个数据框:
import pandas as pd
df=pd.DataFrame({
'Player': ['John','John','John','Steve','Steve','Ted', 'James','Smitty','SmittyJr','DJ'],
'Name': ['A','B', 'A','B','B','C', 'A','D','D','D'],
'Group':['2A','1B','2A','2A','1B','1C','2A','1C','1C','2A'],
'Medal':['G', '?', '?', 'S', 'B','?','?','?','G','?']
})
df = df[['Player','Group', 'Name', 'Medal']]
print(df)
我想更新所有'?'在Medal
列中,其中包含匹配Name
&的任何行的值已填写的Group
列。
例如,因为第一个row 0
是Name:A, Group:2A, Medal:G
,所以'?'在row 6
和2
上将是'G'
结果应该如下:
res=pd.DataFrame({
'Player': ['John','John','John','Steve','Steve','Ted', 'James','Smitty','SmittyJr','DJ'],
'Name': ['A','B', 'A','B','B','C', 'A','D','D','D'],
'Group':['2A','1B','2A','2A','1B','1C','2A','1C','1C','2A'],
'Medal':['G', 'B', 'G', 'S', 'B','?','G','G','G','?']
})
res = res[['Player','Group', 'Name', 'Medal']]
print(res)
最有效的方法是什么?
答案 0 :(得分:2)
另一个解决方案replace
?
的最后一个值(iloc
}的排序Medal
(sort_values
}在每个组中:
df['Medal'] = df.groupby(['Group','Name'])['Medal']
.apply(lambda x: x.replace('?', x.sort_values().iloc[-1]))
print(df)
Player Group Name Medal
0 John 2A A G
1 John 1B B B
2 John 2A A G
3 Steve 2A B S
4 Steve 1B B B
5 Ted 1C C ?
6 James 2A A G
7 Smitty 1C D G
8 SmittyJr 1C D G
9 DJ 2A D ?
<强>计时强>:
In [81]: %timeit (df.groupby(['Group','Name'])['Medal'].apply(lambda x: x.replace('?', x.sort_values().iloc[-1])))
100 loops, best of 3: 4.13 ms per loop
In [82]: %timeit (df.replace('?', np.nan).groupby(['Name', 'Group']).apply(lambda df: df.ffill().bfill()).fillna('?'))
100 loops, best of 3: 11.3 ms per loop
答案 1 :(得分:1)
尝试:
import pandas as pd
import numpy as np
myfill = lambda df: df.ffill().bfill()
df.replace('?', np.nan).groupby(['Name', 'Group']).apply(myfill).fillna('?')
Player Group Name Medal
0 John 2A A G
1 John 1B B B
2 John 2A A G
3 Steve 2A B S
4 Steve 1B B B
5 Ted 1C C ?
6 James 2A A G
7 Smitty 1C D G
8 SmittyJr 1C D G
9 DJ 2A D ?