切片/截断TimeIndexed DataFrame会引发KeyError

时间:2017-06-15 09:05:06

标签: python dataframe keyerror truncation

我正在通过一个函数运行许多datetimeindexed数据帧,在该函数中,我根据开始和结束周期对数据帧进行切片。虽然,同样的功能对许多数据帧运行良好,但它会在某些数据帧上引发无法解释的KeyError。 数据框中的数据在数据类型,列和格式方面是相同的。以下是KeyError

的示例

引发错误的摘录DataFrame:

>> df_boiler_temp
>> @log_date @tariff_indicator  #text                                   
2017-04-23 00:12:48.802              none   65.0
2017-04-23 00:19:00.223              none   64.0
2017-04-23 00:24:02.544              none   63.0
2017-04-23 00:29:20.766              none   62.0
2017-04-23 00:35:00.088              none   61.0
2017-04-23 00:41:00.666              none   60.0
2017-04-23 00:46:00.632              none   59.0
2017-04-23 00:53:38.935              none   58.0
2017-04-23 00:59:21.152              none   57.0
2017-04-23 01:05:59.926              none   56.0
2017-04-23 01:09:58.652              none   65.0
2017-04-23 01:11:00.651              none   66.0
2017-04-23 01:14:10.577              none   67.0
2017-04-23 01:19:58.829              none   66.0
2017-04-23 01:28:00.635              none   65.0

引发KeyError

的代码
df = df_boiler_temp.truncate(before=row['start'], after=row['end'])

(这条线基本相同

df = df_boiler_temp[row['start']: row['end']]

引发的例外是:

KeyError: 1492909671481000000L

在日期时间表示法中转换为2017-04-23 01:07:51.481000

>> row['start']
>> 2017-04-23 01:07:51.481000

>> row['end']
>> 2017-04-23 02:24:07.953000

我不明白为什么在引发KeyError完全落在数据帧中存在的日期时间之前,使用这些值进行截断会引发KeyError。如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

I don't understand why but I found this that pointed out that the dataset is not sorted. While as far as I have analysed the dataset it is already sorted the following line seemed to do the trick:

df = df_boiler_temp.sort_index().truncate(before=row['start'], after=row['end'])

EDIT: It seems that my dataset contained was copied and appended to the original dataset. Hence it was difficult to spot the duplicates.