这是原始数据:
date name score
0 2021-01-02 A 100
1 2021-01-03 A 120
2 2021-01-04 A 130
3 2021-01-05 A 115
4 2021-01-06 A 120
5 2021-01-07 A 70
6 2021-01-08 A 60
7 2021-01-09 A 30
8 2021-01-10 A 10
9 2021-01-11 A 100
10 2021-01-02 B 50
11 2021-01-03 B 40
12 2021-01-04 B 80
13 2021-01-05 B 115
14 2021-01-06 B 100
15 2021-01-07 B 50
16 2021-01-08 B 20
17 2021-01-09 B 40
18 2021-01-10 B 120
19 2021-01-11 B 20
20 2021-01-02 C 80
21 2021-01-03 C 100
22 2021-01-04 C 120
23 2021-01-05 C 115
24 2021-01-06 C 90
25 2021-01-07 C 80
26 2021-01-08 C 150
27 2021-01-09 C 200
28 2021-01-10 C 30
29 2021-01-11 C 40
我想获得以下输出,其中包含一个新列来计算每个名称的尾随 3 天平均值。此外,我想添加一些新的列进行逻辑计算,例如 df.score.shift(1) <= 100
。
date name score 3_day_average previous_score<=100
0 2021-01-02 A 100 NaN False
1 2021-01-03 A 120 NaN True
2 2021-01-04 A 130 116.666667 False
3 2021-01-05 A 115 121.666667 False
4 2021-01-06 A 120 121.666667 False
5 2021-01-07 A 70 101.666667 False
6 2021-01-08 A 60 83.333333 True
7 2021-01-09 A 30 53.333333 True
8 2021-01-10 A 10 33.333333 True
9 2021-01-11 A 100 46.666667 True
10 2021-01-02 B 50 NaN False
11 2021-01-03 B 40 NaN True
12 2021-01-04 B 80 56.666667 True
13 2021-01-05 B 115 78.333333 True
14 2021-01-06 B 100 98.333333 False
15 2021-01-07 B 50 88.333333 True
16 2021-01-08 B 20 56.666667 True
17 2021-01-09 B 40 36.666667 True
18 2021-01-10 B 120 60.000000 True
19 2021-01-11 B 20 60.000000 False
20 2021-01-02 C 80 NaN False
21 2021-01-03 C 100 NaN True
22 2021-01-04 C 120 100.000000 True
23 2021-01-05 C 115 111.666667 False
24 2021-01-06 C 90 108.333333 False
25 2021-01-07 C 80 95.000000 True
26 2021-01-08 C 150 106.666667 True
27 2021-01-09 C 200 143.333333 False
28 2021-01-10 C 30 126.666667 False
29 2021-01-11 C 40 90.000000 True
我现在将 df.groupby('name')
与 df.apply
函数一起使用,如何使用替代方法来缩短执行时间?提前致谢!
答案 0 :(得分:0)
先在 rolling
之后使用 groupby
,然后是 DataFrameGroupBy.shift
:
df['3_day_average'] = (df.groupby('name')['score']
.rolling(3)
.mean()
.reset_index(level=0, drop=True))
df['previous_score<=100'] = df.groupby('name')['score'].shift() <= 100
print (df.head(15))
date name score 3_day_average previous_score<=100
0 2021-01-02 A 100 NaN False
1 2021-01-03 A 120 NaN True
2 2021-01-04 A 130 116.666667 False
3 2021-01-05 A 115 121.666667 False
4 2021-01-06 A 120 121.666667 False
5 2021-01-07 A 70 101.666667 False
6 2021-01-08 A 60 83.333333 True
7 2021-01-09 A 30 53.333333 True
8 2021-01-10 A 10 33.333333 True
9 2021-01-11 A 100 46.666667 True
10 2021-01-02 B 50 NaN False
11 2021-01-03 B 40 NaN True
12 2021-01-04 B 80 56.666667 True
13 2021-01-05 B 115 78.333333 True
14 2021-01-06 B 100 98.333333 False
答案 1 :(得分:0)
data=[(0 ,'2021-01-02','A',100),
(1 ,'2021-01-03','A',120),
(2 ,'2021-01-04','A',130),
(3 ,'2021-01-05','A',115),
(4 ,'2021-01-06','A',120),
(5 ,'2021-01-07','A', 70),
(6 ,'2021-01-08','A', 60),
(7 ,'2021-01-09','A', 30),
(8 ,'2021-01-10','A', 10),
(9 ,'2021-01-11','A',100),
(10 ,'2021-01-02','B', 50),
(11 ,'2021-01-03','B', 40),
(12 ,'2021-01-04','B', 80),
(13 ,'2021-01-05','B',115),
(14 ,'2021-01-06','B',100),
(15 ,'2021-01-07','B', 50),
(16 ,'2021-01-08','B', 20),
(17 ,'2021-01-09','B', 40),
(18 ,'2021-01-10','B',120),
(19 ,'2021-01-11','B', 20),
(20 ,'2021-01-02','C', 80),
(21 ,'2021-01-03','C',100),
(22 ,'2021-01-04','C',120),
(23 ,'2021-01-05','C',115),
(24 ,'2021-01-06','C', 90),
(25 ,'2021-01-07','C', 80),
(26 ,'2021-01-08','C',150),
(27 ,'2021-01-09','C',200),
(28 ,'2021-01-10','C', 30),
(29 ,'2021-01-11','C', 40)]
header=['id','date','name','score']
df=pd.DataFrame(data,columns=header)
df['3d_rolling_avg'] = df.iloc[:,3].rolling(
window=3,
center=False
).mean()
df['shift']=df.apply(lambda x: x.shift(1))['score']
df['prev_score_lessthan_100']=df['shift'].apply(lambda x: True if (x <=100) & (x != None) else False)
print(df)
输出:
id date name score 3d_rolling_avg shift prev_score_lessthan_100
0 0 2021-01-02 A 100 NaN NaN False
1 1 2021-01-03 A 120 NaN 100.0 True
2 2 2021-01-04 A 130 116.666667 120.0 False
3 3 2021-01-05 A 115 121.666667 130.0 False
4 4 2021-01-06 A 120 121.666667 115.0 False
5 5 2021-01-07 A 70 101.666667 120.0 False
6 6 2021-01-08 A 60 83.333333 70.0 True
7 7 2021-01-09 A 30 53.333333 60.0 True
8 8 2021-01-10 A 10 33.333333 30.0 True
9 9 2021-01-11 A 100 46.666667 10.0 True
10 10 2021-01-02 B 50 53.333333 100.0 True
11 11 2021-01-03 B 40 63.333333 50.0 True
12 12 2021-01-04 B 80 56.666667 40.0 True
13 13 2021-01-05 B 115 78.333333 80.0 True
14 14 2021-01-06 B 100 98.333333 115.0 False
15 15 2021-01-07 B 50 88.333333 100.0 True
16 16 2021-01-08 B 20 56.666667 50.0 True
17 17 2021-01-09 B 40 36.666667 20.0 True
18 18 2021-01-10 B 120 60.000000 40.0 True
19 19 2021-01-11 B 20 60.000000 120.0 False
20 20 2021-01-02 C 80 73.333333 20.0 True
21 21 2021-01-03 C 100 66.666667 80.0 True
22 22 2021-01-04 C 120 100.000000 100.0 True
23 23 2021-01-05 C 115 111.666667 120.0 False
24 24 2021-01-06 C 90 108.333333 115.0 False
25 25 2021-01-07 C 80 95.000000 90.0 True
26 26 2021-01-08 C 150 106.666667 80.0 True
27 27 2021-01-09 C 200 143.333333 150.0 False
28 28 2021-01-10 C 30 126.666667 200.0 False
29 29 2021-01-11 C 40 90.000000 30.0 True