用重复值重塑熊猫数据框

时间:2020-06-18 20:02:50

标签: python pandas plotly plotly-python

我使用Plotly go.Table()函数和Pandas函数,试图创建一个表以汇总一些数据。我的数据如下:

import pandas as pd

test_df = pd.DataFrame({'Manufacturer':['BMW', 'Chrysler', 'Chrysler', 'Chrysler', 'Brokertec', 'DWAS', 'Ford', 'Buick'],
                          'Metric':['Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator'],
                          'Dimension':['Short', 'Short', 'Short', 'Long', 'Short', 'Short', 'Long', 'Long'],
                          'User': ['USA', 'USA', 'USA', 'USA', 'USA', 'New USA', 'USA', 'Los USA'],
                          'Value':[50, 3, 3, 2, 5, 7, 10, 5]
                   })

我想要的输出如下(将Dimension乘以Manufacturer):

Manufacturer        Short        Long
Chrysler            6            2
Buick               5            5
Mercedes            7            0
Ford                0            10

我需要稍微调整一下Pandas数据框(这是我遇到麻烦的地方)。我的代码如下:

table_columns = ['Manufacturer', 'Longs', 'Shorts']

manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']

df_new = (df[df['Manufacturer'].isin(manufacturers)]
                        .set_index(['Manufacturer', 'Dimension'])
                        ['Value'].unstack()
                        .reset_index()[table_columns]
                        )

然后,使用Plotly go.Table()函数创建表:

import plotly.graph_objects as go

direction_table = go.Figure(go.Table(
                                header=dict(
                                    values=table_columns,
                                    font=dict(size=12),
                                    line_color='darkslategray',
                                    fill_color='lightskyblue',
                                    align='center'
                                    ),
                                cells=dict(
                                    values=df_new.T,   # using Transpose here
                                    line_color='darkslategray',
                                    fill_color='lightcyan',
                                    align = 'center')
                                )
                )

direction_table

我看到的错误是:

ValueError: Index contains duplicate entries, cannot reshape

解决此问题的最佳方法是什么?

谢谢!

1 个答案:

答案 0 :(得分:2)

您需要将pivot_table与aggfunc ='sum'一起使用,而不是set_index.unstack

table_columns = ['Manufacturer', 'Long', 'Short']

manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']

df_new = (test_df[test_df['Manufacturer'].isin(manufacturers)]
               .pivot_table(index='Manufacturer', columns='Dimension', 
                            values='Value', aggfunc='sum', fill_value=0)
               .reset_index()
               .rename_axis(columns=None)[table_columns]
        )
print (df_new)
  Manufacturer  Long  Short
0        Buick     5      0
1     Chrysler     2      6
2         Ford    10      0

请注意,这不是相同的输出,但我认为您的输入不能提供预期的输出

或者与groupby.sumunstack相同的结果

(test_df[test_df['Manufacturer'].isin(manufacturers)]
        .groupby(['Manufacturer', 'Dimension'])
        ['Value'].sum()
        .unstack(fill_value=0)
        .reset_index()
        .rename_axis(columns=None)[table_columns]
)