我有一个pandas
数据帧和一些Score
。现在,我想检查每个Name
,如果Score
有改善。
如果Score
的{{1}}确实有所改善,我想写Name
-否则写1
。如果0
之前没有Score
可用,我想写Name
所以我的数据框看起来像这样:
NaN
结果应如下所示:
import pandas as pd
import numpy as np
first = {
'Date':['2013-02-28','2013-03-29','2013-05-29','2013-06-29','2013-02-27','2013-04-30','2013-01-20'],
'Name':['Felix','Felix','Felix','Felix','Peter','Peter','Paul'],
'Score':['10','12','13','11','14','14','9']}
df1 = pd.DataFrame(first)
我考虑过做类似的事情:
second = {
'Date':['2013-02-28','2013-03-29','2013-05-29','2013-02-27','2013-04-30','2013-01-20'],
'Name':['Felix','Felix','Felix','Peter','Peter','Paul'],
'Score':['10','12','11','14','14','9'],
'Improvement':['NaN','1','0','NaN','0','NaN']}
result = pd.DataFrame(second)
但是我在df1['Improvement'] = np.NaN
col_idx = df1.columns.get_loc('Improvement')
grouped = df1[df1['ID'].isin(['Felix', 'Peter','Paul'])].groupby(['ID'])
for name, group in grouped:
first = True
for index, row in group.iterrows(): ...
列中实际上有100多个名字
答案 0 :(得分:1)
这可能可以简化,但是您可以将其分解为一个groupby,以获取一个虚拟列,其中包含出现的名字分数的NaN值,然后对所需逻辑进行np.where
df['v'] = df.groupby(['Name'])['Score'].shift()
df['Score'] = pd.np.where(df['Score'] > df['v'], 1, 0)
df['Score'] = pd.np.where(df['v'].isna(), pd.np.nan, df['Score'])
print(df.iloc[:, :-1])
Date Name Score
0 2013-02-28 Felix NaN
1 2013-03-29 Felix 1.0
2 2013-05-29 Felix 1.0
3 2013-06-29 Felix 0.0
4 2013-02-27 Peter NaN
5 2013-04-30 Peter 0.0
6 2013-01-20 Paul NaN