如何统计连续的月度数据?

时间:2015-07-23 07:49:21

标签: pandas dataframe grouping

我从1981年到1991年有很长的时间序列日常数据。我已经使用下面的代码成功计算了系列中的长时间零值:

当我尝试通过更改!= to =来计算系列中的长非零值时,它不适用于每月分组,但它适用于年度分组。谁可以帮我解决这件事?

  Data $(ParametersingData)

(信用:李健勋)

1 个答案:

答案 0 :(得分:1)

呃,我在这里看到了问题。对于.value_counts()

compare-cumsum-pattern会返回类似下面的内容
3    8
0    5
9    3
6    2
8    1
7    1
4    1
1    1
dtype: int64
由于整数索引,

.values[0]会导致混淆。要解决此问题,请使用.iloc[0]访问第一个元素。

import pandas as pd
import numpy as np

# simulate some artificial data
# ============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4000), columns=['prec'], index=pd.date_range('1981-01-01', periods=4000, freq='D'))
df['prec'] = np.where(df['prec'] > 0, df['prec'], 0.0)
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day

df
              prec  year  month  day
1981-01-01  1.7641  1981      1    1
1981-01-02  0.4002  1981      1    2
1981-01-03  0.9787  1981      1    3
1981-01-04  2.2409  1981      1    4
1981-01-05  1.8676  1981      1    5
1981-01-06  0.0000  1981      1    6
1981-01-07  0.9501  1981      1    7
1981-01-08  0.0000  1981      1    8
...            ...   ...    ...  ...
1991-12-07  0.0653  1991     12    7
1991-12-08  0.0000  1991     12    8
1991-12-09  0.3949  1991     12    9
1991-12-10  0.0000  1991     12   10
1991-12-11  1.7796  1991     12   11
1991-12-12  0.0000  1991     12   12
1991-12-13  1.5771  1991     12   13
1991-12-14  0.0000  1991     12   14

[4000 rows x 4 columns]



# processing
# ===============================
def func(group):
    return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]

df.groupby(['year', 'month']).apply(func)

year  month
1981  1         8
      2         3
      3         4
      4        10
      5         5
               ..
1991  8         3
      9         5
      10        3
      11        6
      12        2
dtype: int64


# double check on a particular group
# ======================================================
group = df.groupby(['year', 'month']).get_group((1981,1))
group

              prec  year  month  day
1981-01-01  1.7641  1981      1    1
1981-01-02  0.4002  1981      1    2
1981-01-03  0.9787  1981      1    3
1981-01-04  2.2409  1981      1    4
1981-01-05  1.8676  1981      1    5
1981-01-06  0.0000  1981      1    6
1981-01-07  0.9501  1981      1    7
1981-01-08  0.0000  1981      1    8
1981-01-09  0.0000  1981      1    9
1981-01-10  0.4106  1981      1   10
1981-01-11  0.1440  1981      1   11
1981-01-12  1.4543  1981      1   12
1981-01-13  0.7610  1981      1   13
1981-01-14  0.1217  1981      1   14
1981-01-15  0.4439  1981      1   15
1981-01-16  0.3337  1981      1   16
1981-01-17  1.4941  1981      1   17
1981-01-18  0.0000  1981      1   18
1981-01-19  0.3131  1981      1   19
1981-01-20  0.0000  1981      1   20
1981-01-21  0.0000  1981      1   21
1981-01-22  0.6536  1981      1   22
1981-01-23  0.8644  1981      1   23
1981-01-24  0.0000  1981      1   24
1981-01-25  2.2698  1981      1   25
1981-01-26  0.0000  1981      1   26
1981-01-27  0.0458  1981      1   27
1981-01-28  0.0000  1981      1   28
1981-01-29  1.5328  1981      1   29
1981-01-30  1.4694  1981      1   30
1981-01-31  0.1549  1981      1   31

(group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]

# output: 8

编辑:

您需要修改apply func,如下所示,以计算连续的非零值。

def func(group):
    return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]