如何将列名分配给新创建的数据帧pandas

时间:2018-01-25 23:08:01

标签: python pandas dataframe

我正在尝试将列名分配给新创建的df,以便我可以按列名称进行引用。首先,我基于在一组列中应用总和来创建一个名为sums的新df

sums = data.iloc[:, 62:75].apply(np.sum)

sums.head(5))结果为:

SCH_ENR_HI_F        66134                                                           
SCH_ENR_AM_M        3771                                                           
SCH_ENR_AM_F        3588                                                           
SCH_ENR_AS_M        13388                                                           
SCH_ENR_AS_F        12845

我想添加列标题'student_type'和'enrollment',所以我尝试了:

sums.columns = ['student_type', 'enrollment']

哪个不起作用。我没有在该行上收到错误,但稍后在引用时我得到Key Error 'enrollment'

我想要完成的最佳实践方法是什么?

1 个答案:

答案 0 :(得分:0)

演示:

In [98]: df = pd.DataFrame(np.random.randn(10, 10), columns=list('abcdefghij'))

In [99]: df
Out[99]:
          a         b         c         d         e         f         g         h         i         j
0  0.385203  1.187572 -1.727850  0.623870 -1.042432  0.016608  0.968118  0.551275  0.419904 -1.411984
1 -1.572881  0.187265 -1.578968  0.405994 -0.502633  0.595827 -0.405670  0.491843 -0.145028 -2.097630
2  0.302688 -0.616390 -0.296095  0.702851 -1.269653  1.030805 -1.830220  2.192292 -0.161340  0.750929
3 -0.684007 -1.159139  1.844801 -1.289543  0.469358  0.153529  1.086689  0.246760  2.087439  0.083689
4  0.127821  0.377964  0.633427 -1.003018  0.251742 -0.912455  1.166675  0.327728  1.755409  2.071918
5  0.580320  1.086474  1.251722 -1.456155 -0.458268 -1.155363  1.199957 -2.016104 -0.265787  1.381885
6  0.438060 -1.687241 -1.529382 -0.670691 -1.443586  0.395569 -0.877185  0.227902  0.395737  0.461797
7 -0.566059  0.309534  2.008027  0.397227  0.937474  1.348306  1.403535  1.567550  1.356093  0.231540
8 -2.199514  0.088451  0.628223  0.625264  0.663697 -1.215756 -1.421302  0.729683 -1.241268 -0.367049
9 -1.405923  0.211969 -0.289390  0.946114  1.185240 -0.057775  0.488948  0.774187 -0.030490 -0.649153

In [100]: sums = (df.iloc[:, 2:7]
                    .sum()
                    .reset_index()
                    .set_axis(['student_type', 'enrollment'], axis=1, inplace=False))

In [101]: sums
Out[101]:
  student_type  enrollment
0            c    0.944514
1            d   -0.718088
2            e   -1.209061
3            f    0.199295
4            g    1.779544