日期之间的Python和Pandas数据透视表总和

时间:2019-02-04 10:49:49

标签: python-3.x pandas pivot-table

我有一个使用以下工具创建的数据透视表:

df = df[["Ref", # int64
        "REGION", # object
        "COUNTRY", # object
        "Value_1", # float
        "Value_2", # float
        "Value_3", # float
        "Type", # object 
        "Date", # float64 (may need to convert to date) 
        ]]


table = pd.pivot_table(df, index=["Region", "County"], 
               values=["Value_1", 
                       "Value_2", 
                       "Value_3"],
               columns=["Type"], aggfunc=[np.mean, np.sum, np.count_nonzero], 
               fill_value=0)

我想做的是添加三列以显示这些日期范围之间的Value_1,Value_2和Value_3的均值,总和和非零-<= 1999、2000-2005和> = 2006。

是否有使用熊猫数据透视表执行此操作的好方法,还是我应该使用其他方法?

Df:

enter image description here

Ref REGION  COUNTRY Type    Value_2 Value_3 Value_1 Year
0   2   Yorkshire & The Humber  England Private 25.0    NaN 25.0    1987
1   7   Yorkshire & The Humber  England Voluntary/Charity   30.0    NaN 30.0    1990
2   9   Yorkshire & The Humber  England Private 17.0    2.0 21.0    1991
3   10  Yorkshire & The Humber  England Private 18.0    5.0 28.0    1992
4   14  Yorkshire & The Humber  England Private 32.0    0.0 32.0    1990
5   17  Yorkshire & The Humber  England Private 22.0    5.0 32.0    1987
6   18  Yorkshire & The Humber  England Private 19.0    3.0 25.0    1987
7   19  Yorkshire & The Humber  England Private 35.0    3.0 41.0    1990
8   23  Yorkshire & The Humber  England Voluntary/Charity   25.0    NaN 25.0    1987
9   24  Yorkshire & The Humber  England Private 31.0    2.0 35.0    1988
10  25  Yorkshire & The Humber  England Voluntary/Charity   32.0    NaN 32.0    1987
11  29  Yorkshire & The Humber  England Private 21.0    2.0 25.0    1987
12  30  Yorkshire & The Humber  England Voluntary/Charity   17.0    1.0 19.0    1987
13  31  Yorkshire & The Humber  England Private 27.0    3.0 33.0    2000
14  49  Yorkshire & The Humber  England Private 12.0    3.0 18.0    1992
15  51  Yorkshire & The Humber  England Private 19.0    4.0 27.0    1989
16  52  Yorkshire & The Humber  England Private 11.0    NaN 11.0    1988
17  57  Yorkshire & The Humber  England Private 28.0    2.0 32.0    1988
18  61  Yorkshire & The Humber  England Private 20.0    5.0 30.0    1987
19  62  Yorkshire & The Humber  England Private 36.0    2.0 40.0    1987
20  65  Yorkshire & The Humber  England Voluntary/Charity   16.0    NaN 16.0    1988

1 个答案:

答案 0 :(得分:2)

首先将cut与列Year一起使用,然后按DataFrameGroupBy.agg进行汇总:

lab = ['<=1999','2000-2005',' >=2006']
s = pd.cut(df['Year'], bins=[-np.inf, 1999, 2005, np.inf], labels=lab)
#if exist only date column
#s = pd.cut(df['Date'].dt.year, bins=[-np.inf, 1999, 2005, np.inf], labels=lab)

f = lambda x: np.count_nonzero(x)
table = (df.groupby(["REGION", "COUNTRY", s])
          .agg({'Value_1':'mean', 'Value_2':'sum', 'Value_3':f})
           .reset_index())
print (table)
                   REGION  COUNTRY       Year  Value_1  Value_2  Value_3
0  Yorkshire & The Humber  England     <=1999     27.2    466.0     19.0
1  Yorkshire & The Humber  England  2000-2005     33.0     27.0      1.0