我知道有类似的问题已经得到解答。但是,我似乎无法解决为什么没有一个解决方案适合我。 我的样本数据集:
TimeStamp 340 341 342
10:27:00 1.953036 2.110234 1.981548
10:28:00 1.973408 2.046361 1.806923
10:29:00 0.000000 0.000000 0.014881
10:30:00 2.567976 3.169928 3.479591
我希望每列每两分钟找到一次数据的平均值。虽然df.groupby承诺一个简洁的解决方案,但它使我的TimeStamp列因某种原因而消失。非常感谢帮助。
预期产出:
TimeStamp 340 341 342
10:27:30 1.963222 2.078298 1.894235
10:29:30 1.283988 1.584964 1.747236
尝试过的代码:
import pandas as pd
import numpy as np
path = '/Users/username/Desktop/Model/'
file1 = 'filename.csv'
df = pd.read_csv(path + file1, skipinitialspace = True)
df['TimeStamp'] = pd.to_timedelta(df['TimeStamp'])
df['TimeStamp'] = df['TimeStamp'].dt.floor('min')
df.set_index('TimeStamp')
rowF = len(df['TimeStamp'])
# Average every two min
newdf = df.groupby(np.arange(len(df.index))//2).mean()
print(newdf)
答案 0 :(得分:0)
将时间设为索引:
df.set_index(pd.to_timedelta(df.TimeStamp), inplace=True)
然后使用resample
并每两分钟聚合一次:
df.resample("2min").mean().reset_index()
# TimeStamp 340 341 342
#0 10:27:00 1.963222 2.078298 1.894235
#1 10:29:00 1.283988 1.584964 1.747236
#2 10:31:00 NaN NaN NaN
使用iloc
删除最后一次观察:
df.resample("2min").mean().reset_index().iloc[:-1]
# TimeStamp 340 341 342
#0 10:27:00 1.963222 2.078298 1.894235
#1 10:29:00 1.283988 1.584964 1.747236
如果您希望将TimeStamp
移动30秒:
(df.resample("2min").mean().reset_index()
.assign(TimeStamp = lambda x: x.TimeStamp + pd.Timedelta('30 seconds'))
.iloc[:-1])
# TimeStamp 340 341 342
#0 10:27:30 1.963222 2.078298 1.894235
#1 10:29:30 1.283988 1.584964 1.747236