我有一个使用以下工具创建的数据透视表:
df = df[["Ref", # int64
"REGION", # object
"COUNTRY", # object
"Value_1", # float
"Value_2", # float
"Value_3", # float
"Type", # object
"Date", # float64 (may need to convert to date)
]]
table = pd.pivot_table(df, index=["Region", "County"],
values=["Value_1",
"Value_2",
"Value_3"],
columns=["Type"], aggfunc=[np.mean, np.sum, np.count_nonzero],
fill_value=0)
我想做的是添加三列以显示这些日期范围之间的Value_1,Value_2和Value_3的均值,总和和非零-<= 1999、2000-2005和> = 2006。
是否有使用熊猫数据透视表执行此操作的好方法,还是我应该使用其他方法?
Df:
Ref REGION COUNTRY Type Value_2 Value_3 Value_1 Year
0 2 Yorkshire & The Humber England Private 25.0 NaN 25.0 1987
1 7 Yorkshire & The Humber England Voluntary/Charity 30.0 NaN 30.0 1990
2 9 Yorkshire & The Humber England Private 17.0 2.0 21.0 1991
3 10 Yorkshire & The Humber England Private 18.0 5.0 28.0 1992
4 14 Yorkshire & The Humber England Private 32.0 0.0 32.0 1990
5 17 Yorkshire & The Humber England Private 22.0 5.0 32.0 1987
6 18 Yorkshire & The Humber England Private 19.0 3.0 25.0 1987
7 19 Yorkshire & The Humber England Private 35.0 3.0 41.0 1990
8 23 Yorkshire & The Humber England Voluntary/Charity 25.0 NaN 25.0 1987
9 24 Yorkshire & The Humber England Private 31.0 2.0 35.0 1988
10 25 Yorkshire & The Humber England Voluntary/Charity 32.0 NaN 32.0 1987
11 29 Yorkshire & The Humber England Private 21.0 2.0 25.0 1987
12 30 Yorkshire & The Humber England Voluntary/Charity 17.0 1.0 19.0 1987
13 31 Yorkshire & The Humber England Private 27.0 3.0 33.0 2000
14 49 Yorkshire & The Humber England Private 12.0 3.0 18.0 1992
15 51 Yorkshire & The Humber England Private 19.0 4.0 27.0 1989
16 52 Yorkshire & The Humber England Private 11.0 NaN 11.0 1988
17 57 Yorkshire & The Humber England Private 28.0 2.0 32.0 1988
18 61 Yorkshire & The Humber England Private 20.0 5.0 30.0 1987
19 62 Yorkshire & The Humber England Private 36.0 2.0 40.0 1987
20 65 Yorkshire & The Humber England Voluntary/Charity 16.0 NaN 16.0 1988
答案 0 :(得分:2)
首先将cut
与列Year
一起使用,然后按DataFrameGroupBy.agg
进行汇总:
lab = ['<=1999','2000-2005',' >=2006']
s = pd.cut(df['Year'], bins=[-np.inf, 1999, 2005, np.inf], labels=lab)
#if exist only date column
#s = pd.cut(df['Date'].dt.year, bins=[-np.inf, 1999, 2005, np.inf], labels=lab)
f = lambda x: np.count_nonzero(x)
table = (df.groupby(["REGION", "COUNTRY", s])
.agg({'Value_1':'mean', 'Value_2':'sum', 'Value_3':f})
.reset_index())
print (table)
REGION COUNTRY Year Value_1 Value_2 Value_3
0 Yorkshire & The Humber England <=1999 27.2 466.0 19.0
1 Yorkshire & The Humber England 2000-2005 33.0 27.0 1.0