让我用一个例子来解释这个问题,
def random_dates(start, end, n=10):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
np.random.seed(42)
d = pd.DataFrame({'AAA': np.random.choice(['a','b','c'],size=10),
'BBB': np.random.choice([0,3,1],size=10),
'CCC': random_dates(pd.to_datetime('2015-01-01'),pd.to_datetime('2018-01-01'),10)})
d看起来像这样
AAA BBB CCC
0 c 1 2017-06-08 23:18:17
1 a 1 2015-12-18 06:26:04
2 c 0 2016-05-11 01:34:40
3 c 1 2017-06-18 21:35:54
4 a 3 2016-12-21 11:51:05
5 a 0 2017-02-05 07:36:31
6 c 3 2016-06-25 13:49:15
7 b 3 2015-09-06 19:49:15
8 c 3 2016-11-28 00:06:38
9 c 3 2017-07-12 02:51:41
现在我要添加一个新列,该列应为min_date_value_per_AAAA,其值应类似于
j= d[d.BBB >0].groupby('AAA',as_index=False).CCC.min().rename(columns={"CCC":"min_CCC"})
AAA min_CCC
0 a 2015-12-18 06:26:04
1 b 2015-09-06 19:49:15
2 c 2016-06-25 13:49:15
请注意,我不想采用每个组的最小日期值,该值不同于
d.groupby('AAA').CCC.min()
AAA
a 2015-12-18 06:26:04
b 2015-09-06 19:49:15
c 2016-05-11 01:34:40
现在我可以将d与j一起加入,并通过
将min_CCC添加到j中 pd.merge(d,j,on="AAA",how="inner")
AAA BBB CCC min_CCC
0 c 1 2017-06-08 23:18:17 2016-06-25 13:49:15
1 c 0 2016-05-11 01:34:40 2016-06-25 13:49:15
2 c 1 2017-06-18 21:35:54 2016-06-25 13:49:15
3 c 3 2016-06-25 13:49:15 2016-06-25 13:49:15
4 c 3 2016-11-28 00:06:38 2016-06-25 13:49:15
5 c 3 2017-07-12 02:51:41 2016-06-25 13:49:15
6 a 1 2015-12-18 06:26:04 2015-12-18 06:26:04
7 a 3 2016-12-21 11:51:05 2015-12-18 06:26:04
8 a 0 2017-02-05 07:36:31 2015-12-18 06:26:04
9 b 3 2015-09-06 19:49:15 2015-09-06 19:49:15
我的问题是有更好的方法来实现它。
答案 0 :(得分:2)
我认为您需要transform
d.CCC.where(d.BBB>0).groupby(d.AAA).transform('min')
Out[58]:
0 2016-06-25 13:49:15
1 2015-12-18 06:26:04
2 2016-06-25 13:49:15
3 2016-06-25 13:49:15
4 2015-12-18 06:26:04
5 2015-12-18 06:26:04
6 2016-06-25 13:49:15
7 2015-09-06 19:49:15
8 2016-06-25 13:49:15
9 2016-06-25 13:49:15
Name: CCC, dtype: datetime64[ns]
d['new']=d.CCC.where(d.BBB>0).groupby(d.AAA).transform('min')