我有一个带有多列标题的数据框(df):
enter image description here
yearQ YearC YearS Type1 Type2
index City State Year1 Year2 Year3 Year4 Year5 Year6
0 New York NY 355 189 115 234 178 422
1 Los Angeles CA 100 207 298 230 214 166
2 Chicago IL 1360 300 211 121 355 435
3 Philadelphia PA 270 156 455 232 532 355
4 Phoenix AZ 270 234 112 432 344 116
我想计算每种类型的平均数。最终格式应如下所示:
City State Type1 Type2
New York NY avg of(355+189+115) avg of (234+178+422)
.......
任何人都可以给我一个提示吗? 非常感谢。 凯丝
答案 0 :(得分:2)
我认为您可以在汇总groupby
的列中使用Multindex
的第一级sum
:
print (df.index)
MultiIndex(levels=[[0, 1, 2, 3, 4],
['Chicago', 'Los Angeles', 'New York', 'Philadelphia', 'Phoenix'],
['AZ', 'CA', 'IL', 'NY', 'PA']],
labels=[[0, 1, 2, 3, 4], [2, 1, 0, 3, 4], [3, 1, 2, 4, 0]])
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2'],
['Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5]],
names=['YearQ', 'index'])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
0 New York NY 659 834
1 Los Angeles CA 605 610
2 Chicago IL 1871 911
3 Philadelphia PA 881 1119
4 Phoenix AZ 616 892
但也许以前需要set_index
:
print (df.index)
Int64Index([0, 1, 2, 3, 4], dtype='int64')
print (df.columns)
MultiIndex(levels=[['Type1', 'Type2', 'YearC', 'YearS'],
['City', 'State', 'Year1', 'Year2', 'Year3', 'Year4', 'Year5', 'Year6']],
labels=[[2, 3, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7]],
names=['YearQ', 'index'])
df = df.set_index([('YearC','City'), ('YearS','State')])
df = df.groupby(axis=1, level=0).sum()
print (df)
YearQ Type1 Type2
(YearC, City) (YearS, State)
New York NY 659 834
Los Angeles CA 605 610
Chicago IL 1871 911
Philadelphia PA 881 1119
Phoenix AZ 616 892