我试图将多个列中的多个项目递归分组。想知道是否有人可以帮助我。
以下是示例。
import pandas as pd
import itertools
# i woiuld have more than two groups
data = {'group1': ['a'] * 3 + ['b'] * 3,
'group2': list(range(1,4)) + list(range(1,4)),
'num': [1, 2, 3, 10, 15, 20]}
df = pd.DataFrame(data)
print(df)
desired_df = {'group1': ['a'] * 9,
'group2': ['b'] * 9,
'num_group1': list(range(1,4))*3,
'num_group2': list(itertools.chain.from_iterable(itertools.repeat(x, 3) for x in list(range(1,4)))),
'desired_column': [11, 12, 13, 16, 17, 18, 21, 22, 23]
}
# desired column is the sum of 'num' from 'group1' and 'group2' in df
desired = pd.DataFrame(desired_df)
print(desired)
# i have tried this...which obviously doesnt work
data1 = df.merge(df.drop(columns=['num']), left_on=['group1'], right_on=['group1'])
data1.groupby(['group2_x', 'group2_y'])['num'].sum()
我确定我缺少一些简单的东西...有什么建议吗?
答案 0 :(得分:1)
您可以group_names = ['group1', 'group2']
groups = (x[1][group_names].values for x in df.groupby('group1'))
combined = [tuple(tuple(y) for y in x) for x in it.product(*groups)]
df.set_index(group_names, inplace=True)
result = pd.Series([sum(df.loc[x, 'num'] for x in item) for item in combined],
index=pd.MultiIndex.from_tuples(combined))
print(result)
,然后计算所有组之间相关列的组合,最后使用这些项目来索引原始数据帧:
(a, 1) (b, 1) 11
(b, 2) 16
(b, 3) 21
(a, 2) (b, 1) 12
(b, 2) 17
(b, 3) 22
(a, 3) (b, 1) 13
(b, 2) 18
(b, 3) 23
这将产生以下结果(索引对应于组合):
group1 group2 num
0 a 1 1
1 a 2 2
2 a 3 3
3 b 1 10
4 b 2 15
5 b 3 20
6 c 1 100
7 c 2 200
8 c 3 300
(a, 1) (b, 1) (c, 1) 111
(c, 2) 211
(c, 3) 311
(b, 2) (c, 1) 116
(c, 2) 216
(c, 3) 316
(b, 3) (c, 1) 121
(c, 2) 221
(c, 3) 321
(a, 2) (b, 1) (c, 1) 112
(c, 2) 212
(c, 3) 312
(b, 2) (c, 1) 117
(c, 2) 217
(c, 3) 317
(b, 3) (c, 1) 122
(c, 2) 222
(c, 3) 322
(a, 3) (b, 1) (c, 1) 113
(c, 2) 213
(c, 3) 313
(b, 2) (c, 1) 118
(c, 2) 218
(c, 3) 318
(b, 3) (c, 1) 123
(c, 2) 223
(c, 3) 323
这也适用于两个以上的组,例如:
group1 group2 group3 num
0 a 0 q 1
1 a 1 r 2
2 a 0 s 3
3 a 1 t 4
4 b 0 q 10
5 b 1 r 15
6 b 0 s 20
7 b 1 t 25
(a, 0, q) (b, 0, q) 11
(b, 1, r) 16
(b, 0, s) 21
(b, 1, t) 26
(a, 1, r) (b, 0, q) 12
(b, 1, r) 17
(b, 0, s) 22
(b, 1, t) 27
(a, 0, s) (b, 0, q) 13
(b, 1, r) 18
(b, 0, s) 23
(b, 1, t) 28
(a, 1, t) (b, 0, q) 14
(b, 1, r) 19
(b, 0, s) 24
(b, 1, t) 29
它也适用于两列以上,例如:
private void Filtriraj_clanove_Click(object sender, EventArgs e)
{
(baza_filter_clanova.DataSource as DataTable).DefaultView.RowFilter = string.Format("ime LIKE '{0}%' AND prezime LIKE '{1}%'", pretraga_ime.Text, pretraga_prezime.Text);
}
答案 1 :(得分:1)
您可以使用
x, y = [y.assign(key=1) for x , y in df.groupby('group1')]
s=x.merge(y,on='key')
s['X']=s.num_x+s.num_y
s
group1_x group2_x num_x key group1_y group2_y num_y X
0 a 1 1 1 b 1 10 11
1 a 1 1 1 b 2 15 16
2 a 1 1 1 b 3 20 21
3 a 2 2 1 b 1 10 12
4 a 2 2 1 b 2 15 17
5 a 2 2 1 b 3 20 22
6 a 3 3 1 b 1 10 13
7 a 3 3 1 b 2 15 18
8 a 3 3 1 b 3 20 23
答案 2 :(得分:1)
您可以尝试使用itertools中的组合:
from itertools import combinations
df2=pd.DataFrame([list(key)[0] +list(key)[1] for key in combinations(df.values.tolist(),2)])
df3=df2[df2[0].ne(df2[3])].reset_index(drop=True)
df3[5]=df3[5]+df3[2]
print(df3)
df4=df3[[0,1,3,4,5]].reindex(columns=[0,3,1,4,5]).rename(columns={3:'group2',0:'group1',1:'num_group1',4:'num_group2',5:'desired_column'})
df_desired=df4.sort_values('desired_column').reset_index(drop=True)
print(df_desired)
输出:
0 1 2 3 4 5
0 a 1 1 b 1 11
1 a 1 1 b 2 16
2 a 1 1 b 3 21
3 a 2 2 b 1 12
4 a 2 2 b 2 17
5 a 2 2 b 3 22
6 a 3 3 b 1 13
7 a 3 3 b 2 18
8 a 3 3 b 3 23
group1 group2 num_group1 num_group2 desired_column
0 a b 1 1 11
1 a b 2 1 12
2 a b 3 1 13
3 a b 1 2 16
4 a b 2 2 17
5 a b 3 2 18
6 a b 1 3 21
7 a b 2 3 22
8 a b 3 3 23