将具有“列”中的值的所有数据框行放入一行?

时间:2019-03-17 03:20:32

标签: python pandas

我有一个包含如下内容的数据框:

                    #   Year    Player          PTSN    AVGN    
ThisYear                            
2018Aaron Donald    1   2018    Aaron Donald    280.60  17.538  
2018J.J. Watt       2   2018    J.J. Watt       259.80  16.238  
2018Danielle Hunter 3   2018    Danielle Hunter 237.60  14.850  
2017Aaron Donald    8   2017    Aaron Donald    181.0   12.929  
2017Danielle Hunter 20  2017    Danielle Hunter 133.2   8.325
2016Danielle Hunter 2   2016    Danielle Hunter 204.6   12.788

我想做的是调整列和行,以便我可以对他们进行回归以比较每个球员的前一年与他们的未来一年(如果您知道做我想问的更好的方法,请告诉我)。

我正在寻找的最终结果将是这样的:

Player          PTSN     AVGN      PTSNN1      AVGNN1
Aaron Donald    280.60   17.538    181.0       12.929

我该怎么做?或者,要达到我想要的结果的更好的方法是什么?

1 个答案:

答案 0 :(得分:2)

新答案:设置“当前”和“上一个”列进行关联

# Same setup
df = pd.DataFrame({'#': [1, 2, 3, 8, 20, 2],
 'AVGN': [17.538, 16.238, 14.85, 12.929, 8.325, 12.788],
 'PTSN': [280.6, 259.8, 237.6, 181.0, 133.2, 204.6],
 'Player': ['Aaron Donald',
            'J.J. Watt',
            'Danielle Hunter',
            'Aaron Donald',
            'Danielle Hunter',
            'Danielle Hunter'],
 'Year': [2018, 2018, 2018, 2017, 2017, 2016]})

# Do not unstack the MultiIndex
res = df.set_index(['Player', 'Year'])[['AVGN', 'PTSN']]

# Build a MultiIndex of all players by all years
idx = pd.MultiIndex.from_product([df['Player'].unique(), 
                                  df['Year'].unique()],
                                 names=['Player', 'Year'])

# Introduce a row of NaN values for any combination of 
# player and year not in the original DataFrame
res = res.reindex(idx).sort_index()

res[['AVGN_prev', 'PTSN_prev']] = res.groupby('Player')[['AVGN', 'PTSN']].shift()

res
                        AVGN   PTSN  AVGN_prev  PTSN_prev
Player          Year                                     
Aaron Donald    2016     NaN    NaN        NaN        NaN
                2017  12.929  181.0        NaN        NaN
                2018  17.538  280.6     12.929      181.0
Danielle Hunter 2016  12.788  204.6        NaN        NaN
                2017   8.325  133.2     12.788      204.6
                2018  14.850  237.6      8.325      133.2
J.J. Watt       2016     NaN    NaN        NaN        NaN
                2017     NaN    NaN        NaN        NaN
                2018  16.238  259.8        NaN        NaN

旧答案

将索引设置为['Player', 'Year'],然后将内部级别堆积为列:

# Simplified version of your example DataFrame
df = pd.DataFrame({'#': [1, 2, 3, 8, 20, 2],
 'AVGN': [17.538, 16.238, 14.85, 12.929, 8.325, 12.788],
 'PTSN': [280.6, 259.8, 237.6, 181.0, 133.2, 204.6],
 'Player': ['Aaron Donald',
            'J.J. Watt',
            'Danielle Hunter',
            'Aaron Donald',
            'Danielle Hunter',
            'Danielle Hunter'],
 'Year': [2018, 2018, 2018, 2017, 2017, 2016]})

res = df.set_index(['Player', 'Year'])[['AVGN', 'PTSN']].unstack()

res
                   AVGN                   PTSN              
Year               2016    2017    2018   2016   2017   2018
Player                                                      
Aaron Donald        NaN  12.929  17.538    NaN  181.0  280.6
Danielle Hunter  12.788   8.325  14.850  204.6  133.2  237.6
J.J. Watt           NaN     NaN  16.238    NaN    NaN  259.8

此时,这些列是一个MultiIndex。扁平化列:

# Convert integer years to strings
oldcols = res.columns
res.columns = oldcols.set_levels([oldcols.levels[0],
                                  oldcols.levels[1].astype(str)])

# Flatten columns
res.columns = ['_'.join(col) for col in res.columns.values]

res
                 AVGN_2016  AVGN_2017  AVGN_2018  PTSN_2016  PTSN_2017  PTSN_2018
Player                                                                           
Aaron Donald           NaN     12.929     17.538        NaN      181.0      280.6
Danielle Hunter     12.788      8.325     14.850      204.6      133.2      237.6
J.J. Watt              NaN        NaN     16.238        NaN        NaN      259.8