我有一个看起来像这样的csv文件
user_id heartbeat acc_x acc_y acc_z
monitor_date
2017-07-21 07:00:18 1084 69 -0.09375 -1.12500 0.09375
2017-07-21 07:00:19 1084 68 -0.09375 -1.09375 0.12500
2017-07-21 07:00:20 1084 68 -0.12500 -1.12500 0.15625
2017-07-21 07:00:20 1084 70 -0.12500 -1.09375 0.06250
2017-07-21 07:00:20 1084 70 -0.12500 -1.15625 0.12500
2017-07-21 07:00:22 1084 70 -0.12500 -1.12500 0.28125
2017-07-21 07:00:23 1084 70 -0.12500 -1.03125 0.28125
2017-07-21 07:00:24 1084 70 -0.12500 -1.09375 0.28125
2017-07-21 07:00:24 1084 71 -0.18750 -1.03125 0.25000
2017-07-21 07:00:25 1084 72 -0.21875 -1.06250 0.25000
2017-07-21 07:00:25 1084 72 -0.25000 -1.00000 0.31250
2017-07-21 07:00:26 1084 72 -0.15625 -1.03125 0.28125
2017-07-21 07:00:27 1084 72 -0.15625 -1.12500 0.28125
2017-07-21 07:00:29 1084 72 -0.18750 -1.21875 0.18750
2017-07-21 07:00:29 1084 72 -0.15625 -1.09375 0.09375
2017-07-21 07:00:29 1084 72 -0.25000 -1.09375 0.18750
2017-07-21 07:00:30 1084 72 -0.15625 -1.09375 0.09375
2017-07-21 07:00:30 1084 72 -0.09375 -1.06250 0.09375
2017-07-21 07:00:34 1084 72 -0.12500 -1.15625 0.21875
2017-07-21 07:00:35 1084 72 -0.18750 -1.12500 0.15625
我现在想要做的是每5分钟读取一次数据点而不用任何其他功能改变数据
import pandas as pd
file = pd.read_csv('923_20170721000000_20170721235959_70.csv',header=0,index_col='monitor_date',parse_dates=True,dayfirst=True)
u_cols = ['user_id','heartbeat','acc_x','acc_y','acc_z']
read = file[u_cols]
read = read[(pd.date_range(read.index[0],read.index[-1],freq ='5T'))]
print(read)
我期待索引会是这样的
0 2017-07-21 07:00:18 ...
1 2017-07-21 07:05:18 ...
2 2017-07-21 07:10:18 ...
3 2017-07-21 07:15:18 ...
4 2017-07-21 07:20:18 ...
但是出现了一个看起来像这样的错误。
KeyError: "DatetimeIndex(['2017-07-21 07:00:18', '2017-07-21 07:05:18',\n
'2017-07-21 07:10:18', '2017-07-21 07:15:18',\n '2017-07-21
07:20:18', '2017-07-21 07:25:18',\n '2017-07-21 07:30:18', '
2017-07-21 07:35:18',\n '2017-07-21 07:40:18', '2017-07-21
07:45:18',\n ...\n '2017-07-21 18:20:18', '2017-
07-21
我已经尝试过DataFrame.resample函数,但它没有按照我想要的方式工作。我仍然是蟒蛇熊猫的新手,对于如何使这项工作有点无能为力
答案 0 :(得分:0)
您似乎需要reindex
:
idx = pd.date_range(read.index[0],read.index[-1],freq ='5T')
read = read[~read.index.duplicated()]
read = read.reindex(idx, method='nearest')
print (read)
user_id heartbeat acc_x acc_y acc_z
2017-07-21 07:00:18 1084 69 -0.09375 -1.125 0.09375