ValueError:无法从重复轴重新索引 - 没有重复的轴值

时间:2017-07-24 13:14:43

标签: python pandas pandas-groupby

我按年份对数据框进行分组(它是列上多索引的一个级别),应用一个函数来填充df以包含11列(根据需要添加多个空列),以及然后返回填充的df。但这会引发错误。

finalFormat = (penultimateFormatNot11Columns.groupby( level = 'Year', 
                                                      axis  = 1 )
                                            .apply( padDFToXColumns )
              )




raise ValueError("cannot reindex from a duplicate axis")

在应用的填充函数中,返回的paddedDF在任一轴上都没有任何重复的级别

>>> paddedDF.index.duplicated().any()
False
>>> paddedDF.columns.duplicated().any()
False
>>> 

出现此错误的任何想法?

填充功能

def padDFToXColumns( df, TOT_COLUMNS = 11 ):
    """
    Pad out the number of columns in df to TOT_COLUMNS (add TOT_COLUMNS - len(df) empty columns)
    """

    numColsInDF = len(df.columns)
    if numColsInDF > TOT_COLUMNS:
        print("ERROR: Number Of Columns (%s) Exceeds Max Columns (%s)" % (numColsInDF, TOT_COLUMNS))
        return

    ### Add Empty Columns ###
    numColsToAdd = TOT_COLUMNS - numColsInDF
    columnsToAdd = [ 'EmptyColumn' + str(num) for num in range(numColsInDF + 1, TOT_COLUMNS + 1) ]
    emptyColumns = pd.DataFrame( columns = columnsToAdd, index = np.arange(len(df.index)) )

    paddedDF = df.join(emptyColumns)
    #paddedDF.reset_index( drop = True, inplace = True )

    return paddedDF

DataFrame

>>> mydata.head()

     SurveyYear  Age        Race    Gender  WeightAdjusted
0        1996   39     1.White  1.Female         1039.13
1        1996    9     1.White    2.Male          995.13
2        1996    8     1.White    2.Male          775.66
3        1996   39     1.White    2.Male          404.28
4        1996   33  3.Hispanic  1.Female          404.28

>>> groupbyKeys = ['SurveyYear', 'Age', 'Race', 'Gender']
>>> cellPopulations = mydata.groupby(groupbyKeys).agg( {'WeightAdjusted':'sum'})
>>> cellPopulations.head(20)
                                    WeightAdjusted
SurveyYear Age Race       Gender                  
1996       0   1.White    1.Female      1204859.60
                          2.Male        1227666.34
               2.Black    1.Female       307495.16
                          2.Male         263571.07
               3.Hispanic 1.Female       320359.68
                          2.Male         392902.80
               4.Asian    1.Female        78615.49
                          2.Male          82341.54
               5.Other    1.Female        16134.33
                          2.Male          19365.76
           1   1.White    1.Female      1195134.70
                          2.Male        1195659.14
               2.Black    1.Female       328376.10
                          2.Male         383293.79
               3.Hispanic 1.Female       322862.58
                          2.Male         404322.04
               4.Asian    1.Female        79499.56
                          2.Male          73783.69
               5.Other    1.Female        20647.55
                          2.Male          24222.52
>>> unstackKey  = ['SurveyYear', 'Age', 'Gender']



>>> penultimateFormatNot11Columns = cellPopulations.unstack(unstackKey)
>>> penultimateFormatNot11Columns

           WeightAdjusted                                                                                                       ...                                                                                                          
SurveyYear           1996                                                                                                       ...          1997                                                                                            
Age                    0                     1                     2                     3                     4                ...            76                  77                  78                  79                   80           
Gender           1.Female     2.Male   1.Female     2.Male   1.Female     2.Male   1.Female     2.Male   1.Female     2.Male    ...      1.Female    2.Male  1.Female    2.Male  1.Female    2.Male  1.Female    2.Male   1.Female     2.Male
Race                                                                                                                            ...                                                                                                          
1.White        1204859.60 1227666.34 1195134.70 1195659.14 1197386.21 1288700.89 1251324.65 1307458.14 1236790.33 1374989.75    ...     764103.31 506844.04 702775.64 425705.16 666705.33 423419.49 577674.82 366109.58 3898404.40 2283771.11
2.Black         307495.16  263571.07  328376.10  383293.79  291976.23  326400.85  310870.61  323344.13  301025.43  323199.08    ...      68272.99  43254.98  50082.98  34347.45  50788.70  36772.29  31393.21  20720.47  366569.11  180108.23
3.Hispanic      320359.68  392902.80  322862.58  404322.04  344564.20  340702.86  303325.95  321065.53  382663.64  311911.38    ...      39084.04  17362.56  27507.45  18803.48  17619.95  24060.91  35665.78  23802.81  174972.00  105530.84
4.Asian          78615.49   82341.54   79499.56   73783.69   96289.08   88222.32   96411.97   92029.56   77070.10   90370.15    ...      30196.58  27745.90  18419.49  15406.79   7272.27  17891.33  18116.50   3606.67   57684.54   42662.74
5.Other          16134.33   19365.76   20647.55   24222.52   17469.53   27237.94   11220.90    6996.58   23640.43   14917.77    ...       4441.26       nan   1487.90   2845.89    522.43   2453.52    303.66   2982.57   18870.12    6232.88

1 个答案:

答案 0 :(得分:0)

在我看来,你所需要的只是pivot_table

要做到这一点,您需要在df.reset_index(inplace=True)之后groupby()然后:

df.pivot_table(values='WeightAdjusted', index='Race', columns=['SurveyYear', 'Age', 'Gender'])