如何计算大熊猫每天的总和?

时间:2017-09-27 09:37:03

标签: python pandas

我的数据集看起来像这样:

                 time   raccoons_bought     x   y
22443   1984-01-01 00:00:01     1   55.776462   37.593956
2143    1984-01-01 00:00:01     4   55.757121   37.378225
9664    1984-01-01 00:00:33     3   55.773702   37.599220
33092   1984-01-01 00:01:39     3   55.757121   37.378225
16697   1984-01-01 00:02:32     2   55.678549   37.583023

我需要计算每天购买多少浣熊 我做了什么: 把时间作为指数

df = df.set_index(['time'])

按其排序数据集

df.groupby(df.index.date).count()

但在我排序之前,我需要删除表示坐标

的x和y列

如果我不删除它,数据集将如下所示:

      raccoons_bought x      y
1984-01-01  5497    5497    5497
1984-01-02  5443    5443    5443
1984-01-03  5488    5488    5488
1984-01-04  5453    5453    5453
1984-01-05  5536    5536    5536
1984-01-06  5634    5634    5634
1984-01-07  5468    5468    5468

如果我删除它,数据集看起来会很好:

     raccoons_bought
1984-01-01  5497
1984-01-02  5443
1984-01-03  5488
1984-01-04  5453
1984-01-05  5536
1984-01-06  5634
1984-01-07  5468

所以我的问题是如何计算每天的raccoons_bought并保持坐标不变,因为我想在地图上绘制这些坐标并查找谁购买了那些浣熊

2 个答案:

答案 0 :(得分:3)

您可以这样做:

In [82]: df
Out[82]: 
                      time  raccoons_bought          x          y
22443  1984-01-01 00:00:01                1  55.776462  37.593956
2143   1984-01-01 00:00:01                4  55.757121  37.378225
9664   1984-01-01 00:00:33                3  55.773702  37.599220
33092  1984-01-01 00:01:39                3  55.757121  37.378225
16697  1984-01-01 00:02:32                2  55.678549  37.583023

In [83]: df.groupby(pd.to_datetime(df.time).dt.date).agg(
    ...:     {'raccoons_bought': 'sum', 'x':'first', 'y':'first'}).reset_index() 
Out[83]: 
         time          y          x  raccoons_bought
0  1984-01-01  37.593956  55.776462               13

In [84]: 

请注意,我使用sum作为raccoons_bought的聚合函数来获取总数,如果您只是需要将其更改为countsize

答案 1 :(得分:1)

您可以使用:

#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
#thank you JoeCondron
# trim the timestamps to get the datetime object, faster
dates = df['time'].dt.floor('D')
#if necessary python date object, slowier
#dates = df['time'].dt.floor('D')

#aggregate size if want count NaNs
#aggregate count if want omit NaNs
df1 = df.groupby(dates).size()
print (df1)
time
1984-01-01    5
dtype: int64

#if need sums
df11 = df.groupby(dates)['raccoons_bought'].sum().reset_index()
print (df11)
         time  raccoons_bought
0  1984-01-01               13

如果不需要更改原始列需要transform sum(或sizecount):

a = df.groupby(dates)['raccoons_bought'].transform('sum')
print (a)
22443    13
2143     13
9664     13
33092    13
16697    13
Name: raccoons_bought, dtype: int64

然后按条件过滤所有行:

mask = df.groupby(dates)['raccoons_bought'].transform('sum') > 4
df2 = df.loc[mask, 'raccoons_bought']
print (df2)
22443    1
2143     4
9664     3
33092    3
16697    2
Name: raccoons_bought, dtype: int64

如果列表中有必要的唯一值:

df2 = df.loc[mask, 'raccoons_bought'].unique().tolist()
print (df2)
[1, 4, 3, 2]