我已使用df = df.fillna(0)
从df中删除了所有NaN。
使用
创建数据透视表后pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £')
我仍然再次获得NaN
数据作为输出。
有人可以解释一下为什么以及如何阻止这种输出以及为什么会发生这种情况吗?
答案 0 :(得分:3)
由于输入数据,它将一列转换为索引,而另一列的值转换为列。这些的交集是合计值。
但是,如果输入数据中不存在某些组合,则会导致丢失数据(NaN
)。
df = pd.DataFrame({
'Source':list('abcdef'),
'Total billed £':[5,3,6,9,2,4],
'Customer Location':list('adfbbb')
})
print (df)
Source Total billed £ Customer Location
0 a 5 a
1 b 3 d
2 c 6 f
3 d 9 b
4 e 2 b
5 f 4 b
#e.g because `Source=a` and `Customer Location=b` not exist in source then NaN in output
print (pd.pivot_table(df,index='Source', columns='Customer Location',values='Total billed £'))
Customer Location a b d f
Source
a 5.0 NaN NaN NaN
b NaN NaN 3.0 NaN
c NaN NaN NaN 6.0
d NaN 9.0 NaN NaN
e NaN 2.0 NaN NaN
f NaN 4.0 NaN NaN
此外,here's是reshaping data
的不错读物。
答案 1 :(得分:2)
原因很简单,数据中缺少一对(索引,列)值,例如:
df = pd.DataFrame({"Source": ["foo", "bar", "bar", "bar"],
"Customer Location": ["one", "one", "two", "two", ],
"Total billed £": [10, 20, 30, 40]})
print(df)
设置
Source Customer Location Total billed £
0 foo one 10
1 bar one 20
2 bar two 30
3 bar two 40
您会看到数据中没有('foo','two')对,所以当您这样做时:
result = pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £')
print(result)
输出
Customer Location one two
Source
bar 20.0 35.0
foo 10.0 NaN
要解决此问题,请使用fill_value参数提供默认值:
result = pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £', fill_value=0)
输出
Customer Location one two
Source
bar 20 35
foo 10 0