每天和每个位置出现问题的频率是多少?

时间:2018-10-10 09:10:55

标签: python pandas date dataframe group-by

我有这样的数据框:

Date                  Location_ID   Problem_ID  
---------------------+------------+----------  
2013-01-02 10:00:00  | 1          |  43  
2012-08-09 23:03:01  | 5          |  2  
...

如何计算每天和每个位置出现问题的频率?

1 个答案:

答案 0 :(得分:0)

使用groupbyDate列转换为date或将Grouper转换为聚合size

print (df)
                  Date  Location_ID  Problem_ID
0  2013-01-02 10:00:00            1          43
1  2012-08-09 23:03:01            5           2

#if necessary convert column to datetimes 
df['Date'] = pd.to_datetime(df['Date'])

df1 = df.groupby([df['Date'].dt.date, 'Location_ID']).size().reset_index(name='count')
print (df1)
         Date  Location_ID  count
0  2012-08-09            5      1
1  2013-01-02            1      1

或者:

df1 = (df.groupby([pd.Grouper(key='Date', freq='D'), 'Location_ID'])
         .size()
         .reset_index(name='count'))

如果第一列是索引:

print (df)
                     Location_ID  Problem_ID
Date                                        
2013-01-02 10:00:00            1          43
2012-08-09 23:03:01            5           2


df.index = pd.to_datetime(df.index)

df1 = (df.groupby([df.index.date, 'Location_ID'])
        .size()
        .reset_index(name='count')
        .rename(columns={'level_0':'Date'}))
print (df1)
         Date  Location_ID  count
0  2012-08-09            5      1
1  2013-01-02            1      1

df1 = (df.groupby([pd.Grouper(level='Date', freq='D'), 'Location_ID'])
         .size()
         .reset_index(name='count'))