是否存在任何方法或函数来填充python中的数据透视表缺少的多列和多行?
import pandas as pd
import numpy as np
from io import StringIO
csvfile = StringIO("""Date;Cat;Type;Value
01-Jan;AA;S;1
02-Jan;AA;F;2
02-Jan;BB;T;3
04-Jan;BB;T;3
05-Jan;CC;T;2
05-Jan;DD;T;1
05-Jan;BB;S;4
05-Jan;AA;S;2
05-Jan;DD;S;4""")
df = pd.read_csv(csvfile, sep = ';')
pt = pd.pivot_table(df, values = 'Value', index=['Cat', 'Type'], columns= ['Date'], aggfunc = np.sum, fill_value = 0)
pt
上面的代码结果如下所示,对于某些Cat,缺少03-Jan列的Type列缺少值(F,S,T):
Cat|Type|01-Jan|02-Jan|04-Jan|05-Jan|
---+----+------+------+------+------+
AA |F | | 2| | |
|S | 1| | | 2|
BB |S | | | | 4|
|T | | 3| 3| |
CC |T | | | | 2|
DD |S | | | | 4|
|T | | | | 1|
但预期结果希望为:
Cat|Type|01-Jan|02-Jan|03-Jan|04-Jan|05-Jan|
---+----+------+------+------+------+------+
AA |F | | 2| | | |
|S | 1| | | | 2|
|T | | | | | |
BB |F | | | | | |
|S | | | | | 4|
|T | | 3| | 3| |
CC |F | | | | | |
|S | | | | | |
|T | | | | | 2|
DD |F | | | | | |
|S | | | | | 4|
|T | | | | | 1|
答案 0 :(得分:1)
pivot_table
和Cat
列中的值的所有组合都需要Type
之后的reindex
:
m = pd.MultiIndex.from_product([df['Cat'].unique(),df['Type'].unique()], names=pt.index.names)
pt = pt.reindex(m)
print (pt)
Date 01-Jan 02-Jan 04-Jan 05-Jan
Cat Type
AA S 1.0 0.0 0.0 2.0
F 0.0 2.0 0.0 0.0
T NaN NaN NaN NaN
BB S 0.0 0.0 0.0 4.0
F NaN NaN NaN NaN
T 0.0 3.0 3.0 0.0
CC S NaN NaN NaN NaN
F NaN NaN NaN NaN
T 0.0 0.0 0.0 2.0
DD S 0.0 0.0 0.0 4.0
F NaN NaN NaN NaN
T 0.0 0.0 0.0 1.0
答案 1 :(得分:0)
只需先将df['Type']
转换为Categorical:
df['Type'] = df['Type'].astype('category')
这迫使熊猫用pivot_table
显示每个值。最好将Pandas将诸如'sum'
之类的字符串转换为优化函数。这是一个演示:
df['Type'] = df['Type'].astype('category')
pt = pd.pivot_table(df, values='Value', index=['Cat', 'Type'],
columns='Date', aggfunc='sum', fill_value=0)
print(pt)
Date 01-Jan 02-Jan 04-Jan 05-Jan
Cat Type
AA F 0 2 0 0
S 1 0 0 2
T 0 0 0 0
BB F 0 0 0 0
S 0 0 0 4
T 0 3 3 0
CC F 0 0 0 0
S 0 0 0 0
T 0 0 0 2
DD F 0 0 0 0
S 0 0 0 4
T 0 0 0 1