获取某行的平均值

时间:2019-12-02 05:49:47

标签: python pandas

如何从具有相同分钟的行中获取均值/方差bpm值,并使用缺失分钟之前的值填充缺失分钟?

这是数据:

a={'dateTime': {0: '11/17/19 02:28:05', 1: '11/17/19 02:28:17', 2: '11/17/19 02:28:31', 3: '11/17/19 02:28:42', 4: '11/17/19 02:29:29', 5: '11/17/19 02:29:46', 6: '11/17/19 02:30:43', 7: '11/17/19 02:32:13', 8: '11/17/19 02:49:39', 9: '11/17/19 02:49:49', 10: '11/17/19 02:49:54', 11: '11/17/19 02:49:59', 12: '11/17/19 02:50:04', 13: '11/17/19 02:50:09', 14: '11/17/19 02:50:14', 15: '11/17/19 02:50:24', 16: '11/17/19 02:50:29', 17: '11/17/19 02:50:34', 18: '11/17/19 02:50:39', 19: '11/17/19 02:50:44', 20: '11/17/19 02:50:49', 21: '11/17/19 02:51:04', 22: '11/17/19 02:51:09', 23: '11/17/19 03:04:05', 24: '11/17/19 03:04:33', 25: '11/17/19 11:14:27', 26: '11/17/19 11:14:42', 27: '11/17/19 11:14:52', 28: '11/17/19 11:15:01', 29: '11/17/19 11:15:06', 30: '11/17/19 11:15:21'}, 'bpm': {0: 113, 1: 70, 2: 70, 3: 70, 4: 70, 5: 70, 6: 70, 7: 70, 8: 70, 9: 67, 10: 62, 11: 57, 12: 58, 13: 60, 14: 60, 15: 62, 16: 63, 17: 65, 18: 66, 19: 67, 20: 65, 21: 66, 22: 67, 23: 69, 24: 70, 25: 70, 26: 70, 27: 70, 28: 70, 29: 70, 30: 70}, 'confidence': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 1, 10: 1, 11: 2, 12: 2, 13: 2, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 0, 24: 0, 25: 0, 26: 0, 27: 1, 28: 1, 29: 0, 30: 1}}
ab=pd.DataFrame(a)
print(ab)

dateTime  bpm  confidence
0   11/17/19 02:28:05  113           0
1   11/17/19 02:28:17   70           0
2   11/17/19 02:28:31   70           0
3   11/17/19 02:28:42   70           0
4   11/17/19 02:29:29   70           0
5   11/17/19 02:29:46   70           0
6   11/17/19 02:30:43   70           0
7   11/17/19 02:32:13   70           0
8   11/17/19 02:49:39   70           0
9   11/17/19 02:49:49   67           1
10  11/17/19 02:49:54   62           1
11  11/17/19 02:49:59   57           2

平均值输出示例:

   dateTime        bpm
1   11/17/19 02:28         80
2   11/17/19 02:29         70
3   11/17/19 02:30         70
4   11/17/19 02:31         70
5   11/17/19 02:32         70
6   11/17/19 02:33         70
7   11/17/19 02:34         70
8   11/17/19 02:35         70
9   11/17/19 02:36         70
10  11/17/19 02:37         70
11  11/17/19 02:38         70
12  11/17/19 02:39         70
13  11/17/19 02:40         70
14  11/17/19 02:41         70
15  11/17/19 02:42         70
16  11/17/19 02:43         70
17  11/17/19 02:44         70
18  11/17/19 02:45         70
19  11/17/19 02:46         70
20  11/17/19 02:47         70
21  11/17/19 02:48         70
22  11/17/19 02:49         64
23  11/17/19 02:50         62
24  11/17/19 02:51         66

1 个答案:

答案 0 :(得分:1)

我相信您需要DataFrame.resamplemean并通过ffill来填充缺失值:

ab['dateTime'] = pd.to_datetime(ab['dateTime'])

ab = ab.resample('Min', on='dateTime').mean().ffill()
print(ab)
                       bpm  confidence
dateTime                              
2019-11-17 02:28:00  80.75    0.000000
2019-11-17 02:29:00  70.00    0.000000
2019-11-17 02:30:00  70.00    0.000000
2019-11-17 02:31:00  70.00    0.000000
2019-11-17 02:32:00  70.00    0.000000
                   ...         ...
2019-11-17 11:11:00  69.50    0.000000
2019-11-17 11:12:00  69.50    0.000000
2019-11-17 11:13:00  69.50    0.000000
2019-11-17 11:14:00  70.00    0.333333
2019-11-17 11:15:00  70.00    0.666667

[528 rows x 2 columns]

如果需要为每列指定不同的功能,请使用Resampler.agg和字典:

ab['dateTime'] = pd.to_datetime(ab['dateTime'])

ab = ab.resample('Min', on='dateTime').agg({'bpm':'mean', 'confidence':'var'}).ffill()