我正在尝试对一个非常简单的数据帧进行重新采样,但由于以下异常而失败:
TypeError:仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但具有“ Index”的实例
我阅读了pandas API文档,并查看了数十个示例,但我无法弄清楚自己在做什么错。
# %%
import pandas as pd
print(f"pandas version: {pd.__version__}\n\n")
data = pd.DataFrame({"created": ['2019-03-07T11:01:07.361+0000',
'2019-06-05T15:09:51.203+0100',
'2019-06-05T15:09:51.203+0100'],
"value": [10, 20, 30]})
# %%
print(f"original type: {type(data.created[0])}\n")
data.info()
# %%
data.created = pd.to_datetime(data.created)
# %%
print(f"updated type: {type(data.created[0])}\n")
data.info()
# %%
data.set_index("created", inplace=True)
data.info()
# %%
data.resample("D").mean()
这是结果
pandas version: 0.24.2
original type: <class 'str'>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
created 3 non-null object
value 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 128.0+ bytes
updated type: <class 'datetime.datetime'>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
created 3 non-null object
value 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 128.0+ bytes
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 2019-03-07 11:01:07.361000+00:00 to 2019-06-05 15:09:51.203000+01:00
Data columns (total 1 columns):
value 3 non-null int64
dtypes: int64(1)
memory usage: 48.0+ bytes
Traceback (most recent call last):
File "c:/Users/me/dev/misc/index.py", line 32, in <module>
data.resample("D").mean()
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 8155, in resample
base=base, key=on, level=level)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\resample.py", line 1250, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Users\me\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\resample.py", line 1380, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'Index, but got an instance of 'Index'
答案 0 :(得分:1)
让我们从一些原则开始:
要进行重新采样,源 Series 或 DataFrame 必须具有例如 DatetimeIndex (不是“普通”索引)。
您可以 set_index 到此列,但要这样做,所有 Datetime 元素必须位于相同时区(您的数据不是)。
因此,您可以按照以下步骤操作:
将创建的列转换为日期时间(代码的一部分)时, 传递 utc = True 来“统一”时区:
data.created = pd.to_datetime(data.created, utc=True)
设置索引,然后您可以自由地重新采样:
data.set_index('created').resample("D").mean()
另一个选择:您可以通过 on 参数来代替 set_index 指定一个Datetime(类似)列:
data.resample("D", on='created').mean()
但是此列仍必须在同一时区中包含所有条目。