我有一个包含整数的数据框,但是当我转动它时,它会创建浮点数而我无法解决原因:
我的数据帧(dfDis)看起来像这样:
Year Type Total
0 2006 A talk or presentation 34
1 2006 A magazine, newsletter or online publication 33
2 2006 A formal working group, expert panel or dialogue 2
3 2006 Scientific meeting (conference/symposium etc.) 10
4 2006 A press release, press conference or response ... 6
....
我的透视代码是:
dfDisB = pd.pivot_table(dfDis, index=['Year'], columns = ['Type'],fill_value=0)
由于某种原因,dfDisB就这样结束了(抱歉格式化,我希望你能得到主旨):
Total
Type A broadcast e.g. TV/radio/film/podcast (other than news/press) A formal working group, expert panel or dialogue A magazine, newsletter or online publication A press release, press conference or response to a media enquiry/interview A talk or presentation Engagement focused website, blog or social media channel Participation in an activity, workshop or similar Participation in an open day or visit at my research institution Scientific meeting (conference/symposium etc.)
Year
2006 1.000000 1.571429 6.125000 2.000000 3.235294 1.000000 4.222222 1.000000 5.500000
2007 0.000000 3.666667 24.500000 11.500000 32.250000 1.000000 5.250000 2.500000 28.000000
2008 0.000000 2.500000 21.333333 13.000000 38.230769 1.000000 7.909091 1.000000 37.000000
我很困惑,因为我在报告中提前调整了一些其他数据,但我没有问题。
有什么建议吗?我已经将dfDis导出到csv以检查那里没有浮点数而且没有浮点数,它只是整数。
谢谢
答案 0 :(得分:4)
要了解此行为,请注意:
pd.pivot_table
的默认聚合方法是'mean'。float
[包括NaN
],所有,则系列值将转换为float
。以下是最小的例子。
转换为浮动触发
df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 1, 2, 1],
'B': ['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'],
'C': [1, 2, 3, 4, 5, 6, 7, 4]})
df = pd.pivot_table(df, index='A', columns=['B'], values='C', aggfunc='mean')
print(df)
B a b c
A
1 2.666667 5.0 6.0
2 7.000000 2.0 4.0
转换为浮动未触发
df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 1, 2, 1],
'B': ['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'],
'C': [1, 2, 3, 4, 5, 6, 7, 5]})
df = pd.pivot_table(df, index='A', columns=['B'], values='C', aggfunc='mean')
print(df)
B a b c
A
1 3 5 6
2 7 2 4
答案 1 :(得分:2)
pivot_table()使用的默认聚合函数为mean
。
很可能这会导致浮动值。
演示:
In [49]: df
Out[49]:
Year Type Total
0 2006 A talk or presentation 34
1 2006 A talk or presentation 1 # <--- NOTE !!!
2 2006 A magazine, newsletter or online publication 33
3 2006 A formal working group, expert panel or dialogue 2
4 2006 Scientific meeting (conference/symposium etc.) 10
5 2006 A press release, press conference or response 6
In [50]: df.pivot_table(index=['Year'], columns = ['Type'],fill_value=0)
Out[50]:
Total \
Type A formal working group, expert panel or dialogue A magazine, newsletter or online publication
Year
2006 2 33
Type A press release, press conference or response A talk or presentation Scientific meeting (conference/symposium etc.)
Year
2006 6 17.5 10
答案 2 :(得分:1)
添加fill_value=0
对我来说也有类似的问题。
df = pd.pivot_table(df, index='A', columns=['B'], values='C', fill_value=0)
'aggfunc= 'sum'
对我的数据没有影响。