我为每个ID提供以下数据:
id ---- Base AE Val LT RO+ Prem AM TN T3 AR
05 0 34.34 9.42 70.68 0 0 0 0 0 0 0
108 0 43.77 0 28 0 0 0 0 0 0 0
205 0 77.64 0 32.2 0 0 0 0 0 0 0
320 0 66.24 0 59.628 0 0 0 0 0 0 0
313 0 21.66 0 21.442 0 0 0 0 0 0 0
324 0 72.37 0 701.12 0 0 0 0 0 0 0
505 0 76.057 0 43.87 0 0 0 0 0 0 0
现在我想找到所有列的总和,除了我指定的几列以及其他列分别如下所示:
id Base Val Others Total
05 34.34 70.68 9.42 114.441387
108 43.77 28 0 71.77
205 77.64 32.2 0 109.84
320 66.24 59.628 0 125.868
313 21.66 21.442 0 43.102
324 72.37 701.12 0 773.49
505 76.057 43.87 0 119.927
所以,如果我的列列表要保留:
cols_to_keep = ['Base','Val']
不属于此列表的其他通道将在其他列中汇总,并且每行中的所有值总计为总计。 id是记录的索引。
我可以保留列表中声明的列,但是如何总结其他列,除了在Others列的列表中。 有人可以帮我这个吗? 这些数据是在pandas df。
中答案 0 :(得分:2)
Drop您不希望总结的列:
df['Others'] = df.drop(cols_to_keep, axis=1).sum(axis=1)
df['Total'] = df.sum(axis=1)
答案 1 :(得分:2)
使用assign
,过滤列使用Index.difference
:
cols_to_keep = ['Base','Val']
c = df.columns.difference(cols_to_keep)
df = df[cols_to_keep].assign(Others=df[c].sum(axis=1), Total=df.sum(1))
print (df)
Base Val Others Total
id
5 34.340 70.680 9.42 114.440
108 43.770 28.000 0.00 71.770
205 77.640 32.200 0.00 109.840
320 66.240 59.628 0.00 125.868
313 21.660 21.442 0.00 43.102
324 72.370 701.120 0.00 773.490
505 76.057 43.870 0.00 119.927
答案 2 :(得分:1)
In [47]: !cat b.txt | tr -s ' ' > data.txt
...: df = pd.read_csv("data.txt",sep=" ", dtype={'id':str})
...: df['Others'] = df['AE']
...: df['Total'] = df['Base'] + df['Others'] + df['Val']
...:
...: cols_to_keep=['id', 'Base', 'Val','Others','Total']
...: c = df.columns.difference(cols_to_keep)
...: df.drop(c, axis=1)
...: newDf = df.drop(c, axis=1)
...:
In [48]: newDf
Out[48]:
id Base Val Others Total
0 05 34.340 70.680 9.42 114.440
1 108 43.770 28.000 0.00 71.770
2 205 77.640 32.200 0.00 109.840
3 320 66.240 59.628 0.00 125.868
4 313 21.660 21.442 0.00 43.102
5 324 72.370 701.120 0.00 773.490
6 505 76.057 43.870 0.00 119.927