我有一个看起来像这样的数据集
ID TimeStamp ScoreA ScoreB Type
A 20150908143000 345 316 New
B 20150908140300 400 480 New
B 20150908140600 NaN 120 Old
B 20150908143000 10743 8803 Old
C 20150908140100 600 1715 New
C 20150908140200 200 1062 Old
C 20150908141000 NaN 145 Old
C 20150908141500 418 NaN Old
D 20150908143000 433 65 New
我希望结果看起来像这样
ID TimeStamp Score1 Score2 Type FirstScore1 FirstScore2
A 20150908143000 345 316 New
B 20150908140300 400 480 New
B 20150908140600 NaN 120 Old 400 480
B 20150908143000 10743 8803 Old 400 480
C 20150908140100 600 1715 New
C 20150908140200 200 1062 Old 600 1715
C 20150908141000 NaN 145 Old 600 1715
C 20150908141500 418 NaN Old 600 1715
D 20150908143000 433 65 New
这样,只要“类型”等于“旧”,就会调用该特定“ ID”的最早的“ ScoreA”和“ ScoreB”,并将其分别放置在“ FirstScoreA”和“ FirstScoreB”中。
我已经能够提出使我获得最大价值的代码,但不是最早的。但是即使如此,我也不能仅将其限制为特定的ID,所以我倍受困扰。
有人可以帮我解决这个问题吗?
答案 0 :(得分:0)
df['FirstScore1'] = np.where(df.Type=='New', df.ScoreA, np.nan)
df['FirstScore1'] = df.groupby('ID').FirstScore1.transform(lambda x: x.ffill())
df['FirstScore1'] = np.where(df.Type=='New', np.nan, df['FirstScore1'])
df['FirstScore2'] = np.where(df.Type=='New', df.ScoreB, np.nan)
df['FirstScore2'] = df.groupby('ID').FirstScore2.transform(lambda x: x.ffill())
df['FirstScore2'] = np.where(df.Type=='New', np.nan, df['FirstScore2'])
ID TimeStamp ScoreA ScoreB Type FirstScore1 FirstScore2
0 A 20150908143000 345.0 316.0 New NaN NaN
1 B 20150908140300 400.0 480.0 New NaN NaN
2 B 20150908140600 NaN 1 20.0 Old 400.0 480.0
3 B 20150908143000 10743.0 8803.0 Old 400.0 480.0
4 C 20150908140100 600.0 1715.0 New NaN NaN
5 C 20150908140200 200.0 1062.0 Old 600.0 1715.0
6 C 20150908141000 NaN 145.0 Old 600.0 1715.0
7 C 20150908141500 418.0 NaN Old 600.0 1715.0
8 D 20150908143000 433.0 65.0 New NaN NaN