我有一个包含列的数据框:year,month,day和prec作为标题。如何计算每个月'prec'列中值为0的最长天数。
datasub = data[data['prec'] ==0.0]
datasub.groupby(['year','month'])['prec'].count()
从这段代码我没有得到我期望的价值结果
,数据如下所示:
Out[70]:
year month day prec
0 1981 1 1 1.5
1 1981 1 2 0.0
2 1981 1 3 0.0
3 1981 1 4 0.4
4 1981 1 5 0.0
5 1981 1 6 1.0
6 1981 1 7 1.9
7 1981 1 8 0.6
8 1981 1 9 3.7
9 1981 1 10 0.0
10 1981 1 11 0.0
11 1981 1 12 0.0
12 1981 1 13 0.0
13 1981 1 14 12.2
14 1981 1 15 1.7
15 1981 1 16 0.6
16 1981 1 17 0.9
17 1981 1 18 0.6
18 1981 1 19 0.4
19 1981 1 20 0.2
20 1981 1 21 1.4
21 1981 1 22 3.2
22 1981 1 23 0.0
23 1981 1 24 0.2
24 1981 1 25 1.2
25 1981 1 26 0.0
26 1981 1 27 0.0
27 1981 1 28 0.0
28 1981 1 29 0.0
29 1981 1 30 0.2
... ... ... ... ...
3987 1991 12 2 0.0
3988 1991 12 3 0.0
3989 1991 12 4 0.0
3990 1991 12 5 0.5
3991 1991 12 6 0.4
3992 1991 12 7 1.2
3993 1991 12 8 0.0
3994 1991 12 9 0.0
3995 1991 12 10 0.0
3996 1991 12 11 0.0
3997 1991 12 12 0.0
答案 0 :(得分:1)
import pandas as pd
import numpy as np
# simulate some artificial data
# ============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4000), columns=['prec'], index=pd.date_range('1981-01-01', periods=4000, freq='D'))
df['prec'] = np.where(df['prec'] > 0, df['prec'], 0.0)
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df
prec year month day
1981-01-01 1.7641 1981 1 1
1981-01-02 0.4002 1981 1 2
1981-01-03 0.9787 1981 1 3
1981-01-04 2.2409 1981 1 4
1981-01-05 1.8676 1981 1 5
1981-01-06 0.0000 1981 1 6
1981-01-07 0.9501 1981 1 7
1981-01-08 0.0000 1981 1 8
1981-01-09 0.0000 1981 1 9
1981-01-10 0.4106 1981 1 10
1981-01-11 0.1440 1981 1 11
1981-01-12 1.4543 1981 1 12
1981-01-13 0.7610 1981 1 13
1981-01-14 0.1217 1981 1 14
1981-01-15 0.4439 1981 1 15
... ... ... ... ...
1991-11-30 0.9764 1991 11 30
1991-12-01 0.1772 1991 12 1
1991-12-02 0.0000 1991 12 2
1991-12-03 0.1067 1991 12 3
1991-12-04 0.0000 1991 12 4
1991-12-05 0.0000 1991 12 5
1991-12-06 0.5765 1991 12 6
1991-12-07 0.0653 1991 12 7
1991-12-08 0.0000 1991 12 8
1991-12-09 0.3949 1991 12 9
1991-12-10 0.0000 1991 12 10
1991-12-11 1.7796 1991 12 11
1991-12-12 0.0000 1991 12 12
1991-12-13 1.5771 1991 12 13
1991-12-14 0.0000 1991 12 14
[4000 rows x 4 columns]
# processing
# ===========================================
def func(group):
return (group.prec != 0).astype(int).cumsum().value_counts().values[0] - 1
df.groupby(['year', 'month']).apply(func)
year month
1981 1 2
2 5
3 4
4 2
5 3
6 4
7 3
8 5
9 5
10 2
11 6
12 6
1982 1 5
2 3
3 4
..
1990 10 9
11 4
12 5
1991 1 6
2 4
3 4
4 4
5 4
6 9
7 3
8 5
9 6
10 6
11 3
12 2
dtype: int64
这里的想法是使用非零值的脉冲,然后创建一个步进函数。
# take a look at a sample group
# ===========================================
group = df.groupby(['year', 'month']).get_group((1981,1))
group
# create a step function
group['step_func'] = (group.prec != 0).astype(int).cumsum()
prec year month day step_func
1981-01-01 1.7641 1981 1 1 1
1981-01-02 0.4002 1981 1 2 2
1981-01-03 0.9787 1981 1 3 3
1981-01-04 2.2409 1981 1 4 4
1981-01-05 1.8676 1981 1 5 5
1981-01-06 0.0000 1981 1 6 5
1981-01-07 0.9501 1981 1 7 6
1981-01-08 0.0000 1981 1 8 6
1981-01-09 0.0000 1981 1 9 6
1981-01-10 0.4106 1981 1 10 7
1981-01-11 0.1440 1981 1 11 8
1981-01-12 1.4543 1981 1 12 9
1981-01-13 0.7610 1981 1 13 10
1981-01-14 0.1217 1981 1 14 11
1981-01-15 0.4439 1981 1 15 12
1981-01-16 0.3337 1981 1 16 13
1981-01-17 1.4941 1981 1 17 14
1981-01-18 0.0000 1981 1 18 14
1981-01-19 0.3131 1981 1 19 15
1981-01-20 0.0000 1981 1 20 15
1981-01-21 0.0000 1981 1 21 15
1981-01-22 0.6536 1981 1 22 16
1981-01-23 0.8644 1981 1 23 17
1981-01-24 0.0000 1981 1 24 17
1981-01-25 2.2698 1981 1 25 18
1981-01-26 0.0000 1981 1 26 18
1981-01-27 0.0458 1981 1 27 19
1981-01-28 0.0000 1981 1 28 19
1981-01-29 1.5328 1981 1 29 20
1981-01-30 1.4694 1981 1 30 21
1981-01-31 0.1549 1981 1 31 22
# value_counts, pick the max value and subtract 1
group['step_func'].value_counts().values[0] - 1
2
使用.values[0]
会导致整数索引出现混淆。将其替换为.iloc[0]
。
# processing
# ===========================================
def func(group):
return (group.prec != 0).astype(int).cumsum()[group.prec == 0].value_counts().iloc[0]