我正在探索时间序列数据(使用python),并希望通过熊猫将日期转换为每周间隔,但是会引发以下错误:
TypeError:仅与DatetimeIndex,TimedeltaIndex或 PeriodIndex,但有一个'RangeIndex'实例
数据(dates.csv):
install_date, user, mean_level
2015-09-09, 1, 2
2015-09-11, 2, 2
2015-09-14, 3, 5
2015-09-14, 4, 6
2015-09-20, 5, 3
2015-09-25, 6, 3
2015-09-26, 7, 1
2015-09-27, 8, 1
2015-09-27, 9, 0
2015-09-29, 10, 0
代码:
import numpy as np
import pandas as pd
data = pd.read_csv('data/dates.csv', low_memory=False)
DateData = data.resample('W').sum().head()
print(DateData)
尝试了一些有关日期转换的事情,但是没有任何效果,这仍然会引发错误。这是我需要的输出:
输出:
install_date, user
2015-09-09, 3
2015-09-14, 12
2015-09-25, 40
谢谢!干杯。
答案 0 :(得分:0)
首先将install_date
列转换为datetime
数据类型,然后根据所需规则使用resample
:
print(df)
install_date user mean_level
0 2015-09-09 1 2
1 2015-09-11 2 2
2 2015-09-14 3 5
3 2015-09-14 4 6
4 2015-09-20 5 3
5 2015-09-25 6 3
6 2015-09-26 7 1
7 2015-09-27 8 1
8 2015-09-27 9 0
9 2015-09-29 10 0
df['install_date'] = pd.to_datetime(df['install_date'])
df.dtypes()
install_date datetime64[ns]
user int64
mean_level int64
dtype: object
方法1:使用该列进行重采样
print(df.resample('7D',on='install_date').sum())
user mean_level
install_date
2015-09-09 10 15
2015-09-16 5 3
2015-09-23 40 5
方法2::将日期时间数据类型设置为索引并重新采样
df.set_index('install_date',inplace=True)
print(df.resample('7D').sum())
user mean_level
install_date
2015-09-09 10 15
2015-09-16 5 3
2015-09-23 40 5