我正在通过一个函数运行许多datetimeindexed数据帧,在该函数中,我根据开始和结束周期对数据帧进行切片。虽然,同样的功能对许多数据帧运行良好,但它会在某些数据帧上引发无法解释的KeyError。
数据框中的数据在数据类型,列和格式方面是相同的。以下是KeyError
:
引发错误的摘录DataFrame:
>> df_boiler_temp
>> @log_date @tariff_indicator #text
2017-04-23 00:12:48.802 none 65.0
2017-04-23 00:19:00.223 none 64.0
2017-04-23 00:24:02.544 none 63.0
2017-04-23 00:29:20.766 none 62.0
2017-04-23 00:35:00.088 none 61.0
2017-04-23 00:41:00.666 none 60.0
2017-04-23 00:46:00.632 none 59.0
2017-04-23 00:53:38.935 none 58.0
2017-04-23 00:59:21.152 none 57.0
2017-04-23 01:05:59.926 none 56.0
2017-04-23 01:09:58.652 none 65.0
2017-04-23 01:11:00.651 none 66.0
2017-04-23 01:14:10.577 none 67.0
2017-04-23 01:19:58.829 none 66.0
2017-04-23 01:28:00.635 none 65.0
引发KeyError
df = df_boiler_temp.truncate(before=row['start'], after=row['end'])
(这条线基本相同
df = df_boiler_temp[row['start']: row['end']]
)
引发的例外是:
KeyError: 1492909671481000000L
在日期时间表示法中转换为2017-04-23 01:07:51.481000
。
>> row['start']
>> 2017-04-23 01:07:51.481000
>> row['end']
>> 2017-04-23 02:24:07.953000
我不明白为什么在引发KeyError完全落在数据帧中存在的日期时间之前,使用这些值进行截断会引发KeyError。如何解决这个问题?
答案 0 :(得分:1)
I don't understand why but I found this that pointed out that the dataset is not sorted. While as far as I have analysed the dataset it is already sorted the following line seemed to do the trick:
df = df_boiler_temp.sort_index().truncate(before=row['start'], after=row['end'])
EDIT: It seems that my dataset contained was copied and appended to the original dataset. Hence it was difficult to spot the duplicates.