我正在尝试通过旋转表并使用aggfunc来汇总数据框中多列的数据。我的数据框提供了各个地区的排放数据。我不想总结一些行,所以我选择了我想要求和的行。但是每列的输出是两行:
数据是多年的数字区域数据,所以我想要做的是从一些地区添加数据以获取更大区域的数据。这些年份列在列中。
数据看起来像这样:
inp = [{'Scenario':'Baseline', 'Region':'CHINA', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':5,'1995':10,'2000':15},
{'Scenario':'Baseline', 'Region':'INDIA', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':6,'1995':11,'2000':16},
{'Scenario':'Baseline', 'Region':'INDONESIA', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':7,'1995':12,'2000':17},
{'Scenario':'Baseline', 'Region':'KOREA', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':8,'1995':13,'2000':18},
{'Scenario':'Baseline', 'Region':'JAPAN', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':9,'1995':14,'2000':19},
{'Scenario':'Baseline', 'Region':'THAILAND', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':10,'1995':15,'2000':20},
{'Scenario':'Baseline', 'Region':'RUSSIA', 'Variable':'Methane', 'Unit':'MtCO2eq', '1990':11,'1995':16,'2000':21}]
dt = pd.DataFrame(inp)
dt
1990 1995 2000 Region Scenario Unit Variable
0 5 10 15 CHINA Baseline MtCO2eq Methane
1 6 11 16 INDIA Baseline MtCO2eq Methane
2 7 12 17 INDONESIA Baseline MtCO2eq Methane
3 8 13 18 KOREA Baseline MtCO2eq Methane
4 9 14 19 JAPAN Baseline MtCO2eq Methane
5 10 15 20 THAILAND Baseline MtCO2eq Methane
6 11 16 21 RUSSIA Baseline MtCO2eq Methane
我运行这段代码:
dt_test = dt.pivot_table(dt,index=['Scenario','Variable','Unit'],
columns=[(df['Region'] == 'CHINA')|
(df['Region'] == 'INDIA')|
(df['Region'] == 'INDONESIA')
|(df['Region'] == 'KOREA')],
aggfunc=np.sum)
并将其作为输出:
1990 1995 2000
Region False True False True False True
Scenario Variable Unit
Baseline Methane MtCO2eq 46 10 76 15 106 20
如果有人可以帮我解决这个问题,或者用另一种方法来获得我想要的总数,这将是惊人的。
答案 0 :(得分:0)
使用xs
:
print (dt_test.xs(True, axis=1, level=1))
1990 1995 2000
Scenario Variable Unit
Baseline Methane MtCO2eq 26 46 66
但最好先按isin
和boolean indexing
过滤:
df = df[df['Region'].isin(['CHINA','INDIA','INDONESIA','KOREA'])]
print (df)
1990 1995 2000 Region Scenario Unit Variable
0 5 10 15 CHINA Baseline MtCO2eq Methane
1 6 11 16 INDIA Baseline MtCO2eq Methane
2 7 12 17 INDONESIA Baseline MtCO2eq Methane
3 8 13 18 KOREA Baseline MtCO2eq Methane
然后按群组汇总sum
:
dt_test = df.groupby(['Scenario','Variable','Unit']).sum()
print (dt_test)
1990 1995 2000
Scenario Variable Unit
Baseline Methane MtCO2eq 26 46 66