我正在将我的代码迁移到Pandas 0.22并遇到数据透视表问题。 在0.20版本中,我有一行代码。这具有以下行为:当数据透视表中的单元格为空时,总和聚合返回NAN。
workload_pivot_df = pd.pivot_table(workload_df, index=["athlete_id", "date"], values=["workload"], columns=["type"], aggfunc=('sum','last'))
然而,由于Pandas的变化,0.22 sum现在在没有找到数据时返回0。文档说您可以传递min_count = 1作为参数来获取原始行为https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html。但是我无法在数据透视表中使用它。
答案 0 :(得分:0)
您可以使用lambda
函数并使用相同的列名称tuple
- 第一个值是新列名称和第二个聚合函数:
tup = ('sum', lambda x: x.sum(min_count=1))
workload_pivot_df = pd.pivot_table(workload_df, index=["athlete_id", "date"],
values=["workload"],
columns=["type"], aggfunc=(tup,'last'))
<强>示例强>:
workload_df = pd.DataFrame({'athlete_id':list('aaabbb'),
'date':pd.to_datetime(['2015-01-01'] * 3 + ['2015-01-01'] * 3),
'workload':[np.nan,np.nan,np.nan,4,2,3],
'type':list('aaaabb')})
print (workload_df)
athlete_id date type workload
0 a 2015-01-01 a NaN
1 a 2015-01-01 a NaN
2 a 2015-01-01 a NaN
3 b 2015-01-01 a 4.0
4 b 2015-01-01 b 2.0
5 b 2015-01-01 b 3.0
workload_pivot_df = pd.pivot_table(workload_df, index=["athlete_id", "date"],
values=["workload"],
columns=["type"], aggfunc=('sum','last'))
print (workload_pivot_df)
workload
last sum
type a b a b
athlete_id date
a 2015-01-01 NaN NaN 0.0 NaN #<-all NaNs created 0.0
b 2015-01-01 4.0 3.0 4.0 5.0
tup = ('sum', lambda x: x.sum(min_count=1))
workload_pivot_df = pd.pivot_table(workload_df, index=["athlete_id", "date"],
values=["workload"],
columns=["type"], aggfunc=(tup,'last'))
print (workload_pivot_df)
workload
last sum
type a b a b
athlete_id date
a 2015-01-01 NaN NaN NaN NaN #<-back compatible need NaN
b 2015-01-01 4.0 3.0 4.0 5.0