我有一个从数据透视表创建的数据框,看起来像这样:
top_5_noisy_devices:
{ "device" : "1234", "type" : "foo"}
{ "device" : "1234", "type" : "foo"}
{ "device" : "1234", "type" : "foo"}
{ "device" : "2345", "type" : "foo"}
{ "device" : "4231", "type" : "foo"}
{ "device" : "4354", "type" : "foo"}
我正在迭代遍历multiindex列的上层以为每个公司创建一个sum列:
FSUM = FN + FP
SUM = FN + FP + TP
import pandas as pd
d = {
('company1', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0, 'April- 2014': 499.0,
'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company1', 'False Positive'): {'April- 2012': 0.0, 'April- 2013' 544.0,
'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company1', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0,
'April- 2014': 24.0, 'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0},
('company2', 'False Negative'): {'April- 2012': 112.0, 'April- 2013': 370.0,
'April- 2014': 499.0, 'August- 2012': 431.0, 'August- 2013': 496.0, 'August- 2014': 221.0},
('company2', 'False Positive'): {'April- 2012': 0.0, 'April- 2013': 544.0,
'April- 2014': 50.0, 'August- 2012': 0.0, 'August- 2013': 0.0, 'August- 2014': 426.0},
('company2', 'True Positive'): {'April- 2012': 0.0, 'April- 2013': 140.0, 'April- 2014': 24.0,
'August- 2012': 0.0, 'August- 2013': 0.0,'August- 2014': 77.0}
}
df = pd.DataFrame(d)
company1 company2
FN FP TP FN FP TP
April- 2012 112 0 0 112 0 0
April- 2013 370 544 140 370 544 140
April- 2014 499 50 24 499 50 24
August- 2012 431 0 0 431 0 0
August- 2013 496 0 0 496 0 0
August- 2014 221 426 77 221 426 77
我事先不知道公司名称,因此需要循环
答案 0 :(得分:2)
通过使用一些.stack
和.unstack
来重新组合事物,可以使它变得容易一些:
n [96]: df = df.unstack().unstack(1)
In [97]: df
Out[97]:
False Negative False Positive True Positive
company1 April- 2012 112.0 0.0 0.0
April- 2013 370.0 544.0 140.0
April- 2014 499.0 50.0 24.0
August- 2012 431.0 0.0 0.0
August- 2013 496.0 0.0 0.0
August- 2014 221.0 426.0 77.0
company2 April- 2012 112.0 0.0 0.0
April- 2013 370.0 544.0 140.0
April- 2014 499.0 50.0 24.0
August- 2012 431.0 0.0 0.0
August- 2013 496.0 0.0 0.0
August- 2014 221.0 426.0 77.0
In [98]: df['SUM'] = df.sum(axis=1)
In [99]: df['FSUM'] = df['False Negative'] + df['False Positive']
In [100]: df = df.stack().unstack([0,2])
In [101]: df
Out[101]:
company1 \
False Negative False Positive True Positive SUM FSUM
April- 2012 112.0 0.0 0.0 112.0 112.0
April- 2013 370.0 544.0 140.0 1054.0 914.0
April- 2014 499.0 50.0 24.0 573.0 549.0
August- 2012 431.0 0.0 0.0 431.0 431.0
August- 2013 496.0 0.0 0.0 496.0 496.0
August- 2014 221.0 426.0 77.0 724.0 647.0
company2
False Negative False Positive True Positive SUM FSUM
April- 2012 112.0 0.0 0.0 112.0 112.0
April- 2013 370.0 544.0 140.0 1054.0 914.0
April- 2014 499.0 50.0 24.0 573.0 549.0
August- 2012 431.0 0.0 0.0 431.0 431.0
August- 2013 496.0 0.0 0.0 496.0 496.0
August- 2014 221.0 426.0 77.0 724.0 647.0
答案 1 :(得分:1)
一种方法是在级别命令中使用sum,然后使用pd.concat,最后是sort_index:
pd.concat([df,
df.loc(axis=1)[:,['False Negative','False Positive']].sum(level=0, axis=1).assign(indx2 = 'FSUM').set_index('indx2', append=True).unstack(),
df.sum(level=0, axis=1).assign(indx2='SUM').set_index('indx2', append=True).unstack()],
axis=1).sort_index(axis=1)
输出:
company1 \
FSUM False Negative False Positive SUM True Positive
April- 2012 112.0 112.0 0.0 112.0 0.0
April- 2013 914.0 370.0 544.0 1054.0 140.0
April- 2014 549.0 499.0 50.0 573.0 24.0
August- 2012 431.0 431.0 0.0 431.0 0.0
August- 2013 496.0 496.0 0.0 496.0 0.0
August- 2014 647.0 221.0 426.0 724.0 77.0
company2
FSUM False Negative False Positive SUM True Positive
April- 2012 112.0 112.0 0.0 112.0 0.0
April- 2013 914.0 370.0 544.0 1054.0 140.0
April- 2014 549.0 499.0 50.0 573.0 24.0
August- 2012 431.0 431.0 0.0 431.0 0.0
August- 2013 496.0 496.0 0.0 496.0 0.0
August- 2014 647.0 221.0 426.0 724.0 77.0