我正在尝试将列名分配给新创建的df,以便我可以按列名称进行引用。首先,我基于在一组列中应用总和来创建一个名为sums的新df
sums = data.iloc[:, 62:75].apply(np.sum)
sums.head(5))
结果为:
SCH_ENR_HI_F 66134
SCH_ENR_AM_M 3771
SCH_ENR_AM_F 3588
SCH_ENR_AS_M 13388
SCH_ENR_AS_F 12845
我想添加列标题'student_type'和'enrollment',所以我尝试了:
sums.columns = ['student_type', 'enrollment']
哪个不起作用。我没有在该行上收到错误,但稍后在引用时我得到Key Error 'enrollment'
。
我想要完成的最佳实践方法是什么?
答案 0 :(得分:0)
演示:
In [98]: df = pd.DataFrame(np.random.randn(10, 10), columns=list('abcdefghij'))
In [99]: df
Out[99]:
a b c d e f g h i j
0 0.385203 1.187572 -1.727850 0.623870 -1.042432 0.016608 0.968118 0.551275 0.419904 -1.411984
1 -1.572881 0.187265 -1.578968 0.405994 -0.502633 0.595827 -0.405670 0.491843 -0.145028 -2.097630
2 0.302688 -0.616390 -0.296095 0.702851 -1.269653 1.030805 -1.830220 2.192292 -0.161340 0.750929
3 -0.684007 -1.159139 1.844801 -1.289543 0.469358 0.153529 1.086689 0.246760 2.087439 0.083689
4 0.127821 0.377964 0.633427 -1.003018 0.251742 -0.912455 1.166675 0.327728 1.755409 2.071918
5 0.580320 1.086474 1.251722 -1.456155 -0.458268 -1.155363 1.199957 -2.016104 -0.265787 1.381885
6 0.438060 -1.687241 -1.529382 -0.670691 -1.443586 0.395569 -0.877185 0.227902 0.395737 0.461797
7 -0.566059 0.309534 2.008027 0.397227 0.937474 1.348306 1.403535 1.567550 1.356093 0.231540
8 -2.199514 0.088451 0.628223 0.625264 0.663697 -1.215756 -1.421302 0.729683 -1.241268 -0.367049
9 -1.405923 0.211969 -0.289390 0.946114 1.185240 -0.057775 0.488948 0.774187 -0.030490 -0.649153
In [100]: sums = (df.iloc[:, 2:7]
.sum()
.reset_index()
.set_axis(['student_type', 'enrollment'], axis=1, inplace=False))
In [101]: sums
Out[101]:
student_type enrollment
0 c 0.944514
1 d -0.718088
2 e -1.209061
3 f 0.199295
4 g 1.779544