我从1981年到1991年有很长的时间序列日常数据。我已经使用下面的代码成功计算了系列中的长时间零值:
当我尝试通过更改!= to =来计算系列中的长非零值时,它不适用于每月分组,但它适用于年度分组。谁可以帮我解决这件事?
Data $(ParametersingData)
(信用:李健勋)
答案 0 :(得分:1)
.value_counts()
,compare-cumsum-pattern
会返回类似下面的内容
3 8
0 5
9 3
6 2
8 1
7 1
4 1
1 1
dtype: int64
由于整数索引,和.values[0]
会导致混淆。要解决此问题,请使用.iloc[0]
访问第一个元素。
import pandas as pd
import numpy as np
# simulate some artificial data
# ============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4000), columns=['prec'], index=pd.date_range('1981-01-01', periods=4000, freq='D'))
df['prec'] = np.where(df['prec'] > 0, df['prec'], 0.0)
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df
prec year month day
1981-01-01 1.7641 1981 1 1
1981-01-02 0.4002 1981 1 2
1981-01-03 0.9787 1981 1 3
1981-01-04 2.2409 1981 1 4
1981-01-05 1.8676 1981 1 5
1981-01-06 0.0000 1981 1 6
1981-01-07 0.9501 1981 1 7
1981-01-08 0.0000 1981 1 8
... ... ... ... ...
1991-12-07 0.0653 1991 12 7
1991-12-08 0.0000 1991 12 8
1991-12-09 0.3949 1991 12 9
1991-12-10 0.0000 1991 12 10
1991-12-11 1.7796 1991 12 11
1991-12-12 0.0000 1991 12 12
1991-12-13 1.5771 1991 12 13
1991-12-14 0.0000 1991 12 14
[4000 rows x 4 columns]
# processing
# ===============================
def func(group):
return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]
df.groupby(['year', 'month']).apply(func)
year month
1981 1 8
2 3
3 4
4 10
5 5
..
1991 8 3
9 5
10 3
11 6
12 2
dtype: int64
# double check on a particular group
# ======================================================
group = df.groupby(['year', 'month']).get_group((1981,1))
group
prec year month day
1981-01-01 1.7641 1981 1 1
1981-01-02 0.4002 1981 1 2
1981-01-03 0.9787 1981 1 3
1981-01-04 2.2409 1981 1 4
1981-01-05 1.8676 1981 1 5
1981-01-06 0.0000 1981 1 6
1981-01-07 0.9501 1981 1 7
1981-01-08 0.0000 1981 1 8
1981-01-09 0.0000 1981 1 9
1981-01-10 0.4106 1981 1 10
1981-01-11 0.1440 1981 1 11
1981-01-12 1.4543 1981 1 12
1981-01-13 0.7610 1981 1 13
1981-01-14 0.1217 1981 1 14
1981-01-15 0.4439 1981 1 15
1981-01-16 0.3337 1981 1 16
1981-01-17 1.4941 1981 1 17
1981-01-18 0.0000 1981 1 18
1981-01-19 0.3131 1981 1 19
1981-01-20 0.0000 1981 1 20
1981-01-21 0.0000 1981 1 21
1981-01-22 0.6536 1981 1 22
1981-01-23 0.8644 1981 1 23
1981-01-24 0.0000 1981 1 24
1981-01-25 2.2698 1981 1 25
1981-01-26 0.0000 1981 1 26
1981-01-27 0.0458 1981 1 27
1981-01-28 0.0000 1981 1 28
1981-01-29 1.5328 1981 1 29
1981-01-30 1.4694 1981 1 30
1981-01-31 0.1549 1981 1 31
(group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]
# output: 8
您需要修改apply func
,如下所示,以计算连续的非零值。
def func(group):
return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]