在不更改数据的情况下将数据帧索引的频率更改为5分钟

时间:2017-08-07 05:44:22

标签: python pandas intervals frequency

我有一个看起来像这样的csv文件

              user_id  heartbeat      acc_x    acc_y    acc_z
 monitor_date                                                      
 2017-07-21 07:00:18     1084         69 -0.09375 -1.12500  0.09375
 2017-07-21 07:00:19     1084         68 -0.09375 -1.09375  0.12500
 2017-07-21 07:00:20     1084         68 -0.12500 -1.12500  0.15625
 2017-07-21 07:00:20     1084         70 -0.12500 -1.09375  0.06250
 2017-07-21 07:00:20     1084         70 -0.12500 -1.15625  0.12500
 2017-07-21 07:00:22     1084         70 -0.12500 -1.12500  0.28125
 2017-07-21 07:00:23     1084         70 -0.12500 -1.03125  0.28125
 2017-07-21 07:00:24     1084         70 -0.12500 -1.09375  0.28125
 2017-07-21 07:00:24     1084         71 -0.18750 -1.03125  0.25000
 2017-07-21 07:00:25     1084         72 -0.21875 -1.06250  0.25000
 2017-07-21 07:00:25     1084         72 -0.25000 -1.00000  0.31250
 2017-07-21 07:00:26     1084         72 -0.15625 -1.03125  0.28125
 2017-07-21 07:00:27     1084         72 -0.15625 -1.12500  0.28125
 2017-07-21 07:00:29     1084         72 -0.18750 -1.21875  0.18750
 2017-07-21 07:00:29     1084         72 -0.15625 -1.09375  0.09375
 2017-07-21 07:00:29     1084         72 -0.25000 -1.09375  0.18750
 2017-07-21 07:00:30     1084         72 -0.15625 -1.09375  0.09375
 2017-07-21 07:00:30     1084         72 -0.09375 -1.06250  0.09375
 2017-07-21 07:00:34     1084         72 -0.12500 -1.15625  0.21875
 2017-07-21 07:00:35     1084         72 -0.18750 -1.12500  0.15625

我现在想要做的是每5分钟读取一次数据点而不用任何其他功能改变数据

  import pandas as pd

  file = pd.read_csv('923_20170721000000_20170721235959_70.csv',header=0,index_col='monitor_date',parse_dates=True,dayfirst=True)
  u_cols = ['user_id','heartbeat','acc_x','acc_y','acc_z']
  read = file[u_cols]
  read = read[(pd.date_range(read.index[0],read.index[-1],freq ='5T'))]
  print(read)

我期待索引会是这样的

0   2017-07-21 07:00:18   ...
1   2017-07-21 07:05:18   ...
2   2017-07-21 07:10:18   ...
3   2017-07-21 07:15:18   ...
4   2017-07-21 07:20:18   ...

但是出现了一个看起来像这样的错误。

KeyError: "DatetimeIndex(['2017-07-21 07:00:18', '2017-07-21 07:05:18',\n               
'2017-07-21 07:10:18', '2017-07-21 07:15:18',\n               '2017-07-21 
07:20:18', '2017-07-21 07:25:18',\n               '2017-07-21 07:30:18', ' 
2017-07-21 07:35:18',\n               '2017-07-21 07:40:18', '2017-07-21 
07:45:18',\n               ...\n               '2017-07-21 18:20:18', '2017-
07-21

我已经尝试过DataFrame.resample函数,但它没有按照我想要的方式工作。我仍然是蟒蛇熊猫的新手,对于如何使这项工作有点无能为力

1 个答案:

答案 0 :(得分:0)

您似乎需要reindex

idx = pd.date_range(read.index[0],read.index[-1],freq ='5T')
read = read[~read.index.duplicated()]
read = read.reindex(idx, method='nearest')
print (read)
                      user_id  heartbeat    acc_x  acc_y    acc_z
2017-07-21 07:00:18      1084         69 -0.09375 -1.125  0.09375