Python Pandas - Pivot表输出意外浮动

时间:2018-04-30 12:17:15

标签: python pandas dataframe pivot-table

我有一个包含整数的数据框,但是当我转动它时,它会创建浮点数而我无法解决原因:

我的数据帧(dfDis)看起来像这样:

    Year    Type                                                Total
0   2006    A talk or presentation                                 34
1   2006    A magazine, newsletter or online publication           33
2   2006    A formal working group, expert panel or dialogue        2
3   2006    Scientific meeting (conference/symposium etc.)         10
4   2006    A press release, press conference or response ...       6
....

我的透视代码是:

dfDisB = pd.pivot_table(dfDis, index=['Year'], columns = ['Type'],fill_value=0)

由于某种原因,dfDisB就这样结束了(抱歉格式化,我希望你能得到主旨):

    Total
Type    A broadcast e.g. TV/radio/film/podcast (other than news/press)  A formal working group, expert panel or dialogue    A magazine, newsletter or online publication    A press release, press conference or response to a media enquiry/interview  A talk or presentation  Engagement focused website, blog or social media channel    Participation in an activity, workshop or similar   Participation in an open day or visit at my research institution    Scientific meeting (conference/symposium etc.)
Year                                    
2006    1.000000    1.571429    6.125000    2.000000    3.235294    1.000000    4.222222    1.000000    5.500000
2007    0.000000    3.666667    24.500000   11.500000   32.250000   1.000000    5.250000    2.500000    28.000000
2008    0.000000    2.500000    21.333333   13.000000   38.230769   1.000000    7.909091    1.000000    37.000000

我很困惑,因为我在报告中提前调整了一些其他数据,但我没有问题。

有什么建议吗?我已经将dfDis导出到csv以检查那里没有浮点数而且没有浮点数,它只是整数。

谢谢

3 个答案:

答案 0 :(得分:4)

要了解此行为,请注意:

  1. pd.pivot_table的默认聚合方法是'mean'。
  2. 3个整数的平均值通常不是整数。
  3. 如果数据透视表中的任何值为float [包括NaN],所有,则系列值将转换为float
  4. 以下是最小的例子。

    转换为浮动触发

    df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 1, 2, 1],
                       'B': ['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'],
                       'C': [1, 2, 3, 4, 5, 6, 7, 4]})
    
    df = pd.pivot_table(df, index='A', columns=['B'], values='C', aggfunc='mean')
    
    print(df)
    
    B         a    b    c
    A                    
    1  2.666667  5.0  6.0
    2  7.000000  2.0  4.0
    

    转换为浮动未触发

    df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 1, 2, 1],
                       'B': ['a', 'b', 'a', 'c', 'b', 'c', 'a', 'a'],
                       'C': [1, 2, 3, 4, 5, 6, 7, 5]})
    
    df = pd.pivot_table(df, index='A', columns=['B'], values='C', aggfunc='mean')
    
    print(df)
    
    B  a  b  c
    A         
    1  3  5  6
    2  7  2  4
    

答案 1 :(得分:2)

pivot_table()使用的默认聚合函数为mean

很可能这会导致浮动值。

演示:

In [49]: df
Out[49]:
   Year                                              Type  Total
0  2006                            A talk or presentation     34
1  2006                            A talk or presentation      1  # <--- NOTE !!!
2  2006      A magazine, newsletter or online publication     33
3  2006  A formal working group, expert panel or dialogue      2
4  2006    Scientific meeting (conference/symposium etc.)     10
5  2006     A press release, press conference or response      6

In [50]: df.pivot_table(index=['Year'], columns = ['Type'],fill_value=0)
Out[50]:
                                                Total                                               \
Type A formal working group, expert panel or dialogue A magazine, newsletter or online publication
Year
2006                                                2                                           33


Type A press release, press conference or response A talk or presentation Scientific meeting (conference/symposium etc.)
Year
2006                                             6                   17.5                                             10

答案 2 :(得分:1)

添加fill_value=0对我来说也有类似的问题。

df = pd.pivot_table(df, index='A', columns=['B'], values='C', fill_value=0)

'aggfunc= 'sum'对我的数据没有影响。