将pandas groupby对象划分为大块

时间:2018-10-15 14:50:58

标签: pandas pandas-groupby

我有一个熊猫DataFrame,我正在按列['client','product','data']进行分组。

grouped_data = raw_data.groupby(['client', 'product', 'data'])
print(len(grouped_data))
# 10000

我想将生成的groupby对象分成两个块,一个包含大约80%的组,另一个包含其余的组。

我已经把头撞在屏幕上了一段时间了...

2 个答案:

答案 0 :(得分:2)

通过使用np.split

df['key']=df[['client', 'product', 'data']].apply(tuple,1)

g1,g2=np.split(df['key'].unique(),[2000])

df1=df[df['key'].isin(g1)]

df2=df[df['key'].isin(g2)]

答案 1 :(得分:0)

您可以执行以下操作:

grouped = df.groupby('Client')

bound = int(np.ceil(len(grouped)*0.8))-1

chunk1 = [g[1] for g in list(grouped)[:bound]]
chunk2 = [g[1] for g in list(grouped)[bound:]]

对于以下示例数据框:

     Client   Product   Data
0   Client1  ProductA  Data1
1   Client2  ProductA  Data3
2   Client3  ProductB  Data1
3   Client4  ProductA  Data2
4   Client5  ProductB  Data1
5   Client2  ProductA  Data1
6   Client3  ProductA  Data3
7   Client2  ProductB  Data1
8   Client3  ProductB  Data1
9   Client5  ProductA  Data2
10  Client1  ProductA  Data1
11  Client1  ProductB  Data1
12  Client4  ProductA  Data2
13  Client3  ProductB  Data2
14  Client2  ProductB  Data3

chunk1将产生:

     Client   Product   Data
0   Client1  ProductA  Data1
10  Client1  ProductA  Data1
11  Client1  ProductB  Data1

     Client   Product   Data
1   Client2  ProductA  Data3
5   Client2  ProductA  Data1
7   Client2  ProductB  Data1
14  Client2  ProductB  Data3

     Client   Product   Data
2   Client3  ProductB  Data1
6   Client3  ProductA  Data3
8   Client3  ProductB  Data1
13  Client3  ProductB  Data2

chunk2将产生:

     Client   Product   Data
3   Client4  ProductA  Data2
12  Client4  ProductA  Data2

    Client   Product   Data
4  Client5  ProductB  Data1
9  Client5  ProductA  Data2