在将CSV读入数据帧后,我尝试将我的“值”列重新采样到5秒,从时间值的第一舍入第二秒开始。我想在接下来的5秒内获得所有值的平均值,从46:19.6开始(格式为%M:%S:%f)。因此,代码会给我平均值46:20,然后是46:25,依此类推...有人知道怎么做吗?谢谢!
输入:
df = pd.DataFrame({'Time': {0: '46:19.6',
1: '46:20.7',
2: '46:21.8',
3: '46:22.9',
4: '46:24.0',
5: '46:25.1',
6: '46:26.2',
7: '46:27.6',
8: '46:28.7',
9: '46:29.8',
10: '46:30.9',
11: '46:32.0',
12: '46:33.2',
13: '46:34.3',
14: '46:35.3',
15: '46:36.5',
16: '46:38.8',
17: '46:40.0'},
'Value': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 8,
8: 9,
9: 10,
10: 11,
11: 12,
12: 13,
13: 14,
14: 15,
15: 17,
16: 19,
17: 20}})
答案 0 :(得分:1)
假设您的Time
字段为datetime64[ns]
格式,则只需使用pd.Grouper
并传递freq=5S
:
# next line of code is optional to transform to datetime format if the `Time` field is an `object` i.e. string.
# df['Time'] = pd.to_datetime('00:'+df['Time'])
df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index()
#Depending on what you want to do, you can also replace the above line of code with one of two below:
#df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index().iloc[1:]
#df1 = df.groupby(pd.Grouper(key='Time', freq='5S', base=4.6))['Value'].mean().reset_index()
#In the above line of code 4.6s can be adjusted to whatever number between 0 and 5.
df1
输出:
Time Value
0 2020-07-07 00:46:15 0.0
1 2020-07-07 00:46:20 2.5
2 2020-07-07 00:46:25 7.6
3 2020-07-07 00:46:30 12.5
4 2020-07-07 00:46:35 17.0
5 2020-07-07 00:46:40 20.0
我创建的示例DataFrame中的完整可复制代码:
import re
import pandas
df = pd.DataFrame({'Time': {0: '46:19.6',
1: '46:20.7',
2: '46:21.8',
3: '46:22.9',
4: '46:24.0',
5: '46:25.1',
6: '46:26.2',
7: '46:27.6',
8: '46:28.7',
9: '46:29.8',
10: '46:30.9',
11: '46:32.0',
12: '46:33.2',
13: '46:34.3',
14: '46:35.3',
15: '46:36.5',
16: '46:38.8',
17: '46:40.0'},
'Value': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 8,
8: 9,
9: 10,
10: 11,
11: 12,
12: 13,
13: 14,
14: 15,
15: 17,
16: 19,
17: 20}})
df['Time'] = pd.to_datetime('00:'+df['Time'])
df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index()
df1