我有一个看起来像这样的pandas数据框。
set language group version metric_1 metric_2 metric_3
X English 1 A 100 20 5
X French 2 A 90 10 10
X English 1 B 80 30 15
X French 2 B 70 20 20
Y English 1 A 200 20 30
Y French 2 A 180 30 20
Y English 1 B 160 10 10
Y French 2 B 140 20 5
我想用实验属性的所有组合汇总指标 - 集合,语言,组和&版。因此摘要数据框将如下所示。
set language group version metric_1 metric_2 metric_3
X 800 140 80
Y 1000 140 80
English 1200 200 80
French 600 80 80
1 1050 120 60
2 750 160 100
A 850 140 80
B 950 140 80
X English 500 100 40
X French 300 40 40
Y English 700 100 40
Y French 300 40 40
X 1 350 60 30
X 2 450 80 50
Y 1 700 60 30
Y 2 300 80 50
X A 350 70 40
X B 450 70 40
Y A 500 70 40
Y B 500 70 40
English 1 ...
English 2 ...
French 1 ...
French 2 ...
English A ...
English B ...
French A ...
French B ...
1 A ...
1 B ...
2 A ...
2 B ...
X English 1 ...
X English 2 ...
X French 1 ...
X French 2 ...
Y English 1 ...
Y English 2 ...
Y French 1 ...
Y French 2 ...
X English A ...
X English B ...
X French A ...
X French B ...
Y English A ...
Y English B ...
Y French A ...
Y French B ...
X 1 A ...
X 1 B ...
X 2 A ...
X 2 B ...
Y 1 A ...
Y 1 B ...
Y 2 A ...
Y 2 B ...
English 1 A ...
English 1 B ...
English 2 A ...
English 2 B ...
French 1 A ...
French 1 B ...
French 2 A ...
French 2 B ...
我知道我可以通过使用groupby的不同组合并将所有这些组合连接到单个数据帧中来实现这种强力。这可能会扩展到更多属性,所以我试图找到一个更具可扩展性的解决方案。我一直在阅读通过itertools提供的功能,但不确定它们将如何应用。
对此有任何想法/指示。谢谢!
答案 0 :(得分:0)
事实上,itertools
的{{3}}函数可以帮助您创建所有组合。我们假设您的数据位于名为df
的数据框中。
from itertools import combinations
# create two list, one for all columns you want to sum, and the others
list_metric = [col for col in df.columns if 'metric' in col]
list_non_metric = [col for col in df.columns if 'metric' not in col]
# create the dataframe grouped on all columns
df_grouped = df.groupby(list_non_metric,as_index=False)[list_metric].sum()
# use concat and list comprehension to create all the combinations
df_output = (pd.concat([df_grouped.groupby(list(combi),as_index=False)[list_metric].sum()
for j in range(1, len(list_non_metric)+1)
for combi in combinations(list_non_metric,j) ])
.fillna(''))
# reorder the columns as the input data (if necessary)
df_output = df_output[df.columns]
如果您想了解如何运作combinations
,请尝试打印这些行:
[combi for combi in combinations(list_non_metric,2)]
然后第二个for j in range(1, len(list_non_metric)+1)
将有助于创建list_non_metric
答案 1 :(得分:0)
这是一种方法。我假设您只提供了一部分数据,因为总计不会加起来:
In []:
import itertools as it
cols = df.columns.tolist()
index = ['set', 'language', 'group', 'version']
df = df.set_index(index)
pd.concat([df.groupby(level=x).sum().reset_index()
for n in range(1, len(index)+1)
for x in it.combinations(range(len(index)), n)],
sort=True)[cols].fillna('')
Out[]:
set language group version metric_1 metric_2 metric_3
0 X 340 80 50
1 Y 680 80 65
0 English 540 80 60
1 French 480 80 55
0 1 540 80 60
1 2 480 80 55
0 A 570 80 65
1 B 450 80 50
0 X English 180 50 20
1 X French 160 30 30
2 Y English 360 30 40
3 Y French 320 50 25
...