我正在寻找一种更具编程性的方法来创建多个新列,作为Pandas DataFrame中现有列的函数。
我有14列Level_2 - Level_15。我想迭代地创建14个新列,它们将列2-15,然后是3-15,然后是4-15,依此类推。
现在我的代码看起来像这样
cols['2_sum'] = cols.Level_2 + cols.Level_3 + cols.Level_4 + cols.Level_5 + cols.Level_6 + cols.Level_7 + cols.Level_8 + cols.Level_9 + cols.Level_10 + cols.Level_11 + cols.Level_12 + cols.Level_13 + cols.Level_14 + cols.Level_15
cols['3_sum'] = cols.Level_3 + cols.Level_4 + cols.Level_5 + cols.Level_6 + cols.Level_7 + cols.Level_8 + cols.Level_9 + cols.Level_10 + cols.Level_11 + cols.Level_12 + cols.Level_13 + cols.Level_14 + cols.Level_15
cols['4_sum'] = cols.Level_4 + cols.Level_5 + cols.Level_6 + cols.Level_7 + cols.Level_8 + cols.Level_9 + cols.Level_10 + cols.Level_11 + cols.Level_12 + cols.Level_13 + cols.Level_14 + cols.Level_15
是否有更多的熊猫或pythonic方式来做到这一点?
谢谢!
答案 0 :(得分:3)
以下是一个例子:
示例数据:
In [147]: df = pd.DataFrame(np.random.rand(3, 15),
...: columns=['ID'] + ['Level_{}'.format(x) for x in range(2, 16)])
...:
In [148]: df
Out[148]:
ID Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Level_10 Level_11 \
0 0.851407 0.957810 0.204217 0.848265 0.168324 0.010265 0.191499 0.787552 0.648678 0.424462 0.038888
1 0.354270 0.442843 0.631624 0.081120 0.357300 0.211621 0.177321 0.316312 0.836935 0.445603 0.267165
2 0.998240 0.341875 0.590768 0.475935 0.071915 0.720590 0.041327 0.926167 0.671880 0.516845 0.450720
Level_12 Level_13 Level_14 Level_15
0 0.465109 0.508491 0.282262 0.848373
1 0.205415 0.399493 0.537186 0.774417
2 0.131734 0.554596 0.253658 0.104193
解决方案:
In [149]: for n in range(15, 1, -1):
...: df['{}_sum'.format(15-n+2)] = df.filter(regex=r'Level_\d+').iloc[:, :n].sum(1)
...:
结果:
In [150]: df
Out[150]:
ID Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Level_10 ... \
0 0.851407 0.957810 0.204217 0.848265 0.168324 0.010265 0.191499 0.787552 0.648678 0.424462 ...
1 0.354270 0.442843 0.631624 0.081120 0.357300 0.211621 0.177321 0.316312 0.836935 0.445603 ...
2 0.998240 0.341875 0.590768 0.475935 0.071915 0.720590 0.041327 0.926167 0.671880 0.516845 ...
6_sum 7_sum 8_sum 9_sum 10_sum 11_sum 12_sum 13_sum 14_sum 15_sum
0 4.745067 4.279958 4.241070 3.816608 3.167931 2.380379 2.188880 2.178615 2.010292 1.162027
1 3.973259 3.767844 3.500679 3.055076 2.218140 1.901828 1.724508 1.512887 1.155587 1.074468
2 4.939755 4.808021 4.357301 3.840456 3.168576 2.242409 2.201082 1.480492 1.408577 0.932643
[3 rows x 29 columns]
答案 1 :(得分:1)
您可以创建列列表
cols= list(cols)
cols['2_sum'] = cols[cols].sum(axis = 1)
cols['3_sum'] = cols['2_sum'] - cols['Level_2']
cols['4_sum'] = cols['3_sum'] - cols['Level_3']
答案 2 :(得分:1)
希望它可以帮到你
ColsListName = ['3_sum' ... ,'14_sum']
ColsListLevel = ['Level_2','Level_3' ... ,'Level_15']
sumCols = cols.Level_2 + cols.Level_3 + cols.Level_4 + cols.Level_5 + cols.Level_6 + cols.Level_7 + cols.Level_8 + cols.Level_9 + cols.Level_10 + cols.Level_11 + cols.Level_12 + cols.Level_13 + cols.Level_14 + cols.Level_15
cols['2_sum'] = sumCols
for i in range(len(ColsListLevel)) :
cols[ColsListName [i]] = sumCols - cols.ColsListLevel [i]
答案 3 :(得分:0)
import pandas as pd
import numpy as np
np.random.seed(1)
cols = pd.DataFrame(np.random.rand(2, 14),
columns=['Level_'+str(i) for i in range(2, 16)])
现在数据框看起来像:
Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Level_10 Level_11 Level_12 Level_13 Level_14 Level_15 2_sum
0 0.199666 0.285152 0.598139 0.602477 0.004284 0.874587 0.263949 0.527301 0.306443 0.282778 0.181330 0.280506 0.456637 0.998124 5.861371
1 0.279320 0.508074 0.435350 0.816866 0.691988 0.179261 0.134478 0.949185 0.867022 0.410112 0.139481 0.537539 0.042163 0.366138 6.356977
然后:
for i in range(2, 15):
cols[str(i)+'_sum'] = cols.loc[:, 'Level_'+str(i):'Level_15'].sum(axis=1)
cols
我认为这就是你想要的。