我想计算时间日志中按月分组的实例数。我有以下“熊猫”专栏:
print df['date_unconditional'][:5]
0 2018-10-15T07:00:00
1 2018-06-12T07:00:00
2 2018-08-28T07:00:00
3 2018-08-29T07:00:00
4 2018-10-29T07:00:00
Name: date_unconditional, dtype: object
然后我将其转换为日期时间格式
df['date_unconditional'] = pd.to_datetime(df['date_unconditional'].dt.strftime('%m/%d/%Y'))
print df['date_unconditional'][:5]
0 2018-10-15
1 2018-06-12
2 2018-08-28
3 2018-08-29
4 2018-10-29
Name: date_unconditional, dtype: datetime64[ns]
然后我尝试对它们进行计数,但我一直犯错
df['date_unconditional'] = pd.to_datetime(df['date_unconditional'], errors='coerce')
print df['date_unconditional'].groupby(pd.Grouper(freq='M')).count()
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
该格式不是RangeIndex,我尝试以其他方式更改它,但是此错误不断弹出。
答案 0 :(得分:0)
在Grouper
中使用参数key
:
df['date_unconditional'] = pd.to_datetime(df['date_unconditional'], errors='coerce')
print (df.groupby(pd.Grouper(freq='M',key='date_unconditional'))['date_unconditional'].count())
2018-06-30 1
2018-07-31 0
2018-08-31 2
2018-09-30 0
2018-10-31 2
Freq: M, Name: date_unconditional, dtype: int64
或通过DataFrame.set_index
创建DatetimeIndex
,然后可以使用GroupBy.size
-两者之间的区别是count
排除了缺失值,size
不是。
df['date_unconditional'] = pd.to_datetime(df['date_unconditional'], errors='coerce')
print (df.set_index('date_unconditional').groupby(pd.Grouper(freq='M')).size())
2018-06-30 1
2018-07-31 0
2018-08-31 2
2018-09-30 0
2018-10-31 2
Freq: M, dtype: int64