熊猫按条件将行分为3组

时间:2018-07-05 04:09:33

标签: python python-3.x pandas group-by

我有这样的df。

import pandas as pd
import numpy as np
user = pd.DataFrame({'User':['101','101','101','102','102','101','101','102','102','102','102','102'],'Country':['India','Japan','India','Brazil','Japan','UK','Austria','Japan','Singapore','UK','UK','UK']
                    ,'Count':[85,78,70,5,6,8,60,30,5,6,5,4]})

我想对计数列进行排序,然后将前30%的行分配给组3,然后将下30%的行分配给2,其余30%的行分配给组1。我该怎么做。这是我的预期输出。前4列。并查看我的计算结果我如何划分30%,30%,40%

want output

1 个答案:

答案 0 :(得分:1)

您首先需要按sort_values对列进行排序,然后对groupby和具有自定义功能的numpy.split进行排序,并将每个组的长度返回到新DataFrame的新行:

完美MaxU answer的想法,谢谢。


用于顶部30-30-30

user = user.sort_values(['User','Count'], ascending=[True, False])

def f(x):
    #split to 4 groups, because 3 + 3 + 3 != 1 
    a, b, c, d = np.split(x, [int(.3*len(x)), int(.6*len(x)), int(.9*len(x))])
    return pd.Series([len(a), len(b), len(c)], index=['30','30','30'])

df = user.groupby('User').apply(f)
df['sum'] = df.sum(axis=1)
print (df)
      30  30  30  sum
User                 
101    1   2   1    4
102    2   2   2    6

30-30-40

user = user.sort_values(['User','Count'], ascending=[True, False])

def f(x):
    #split to 3 groups, because 3 + 3 + 4 == 1
    a, b, c = np.split(x, [int(.3*len(x)), int(.6*len(x))])
    return pd.Series([len(a), len(b), len(c)], index=['30','30','40'])

df = user.groupby('User').apply(f)
df['sum'] = df.sum(axis=1)
print (df)

      30  30  40  sum
User                 
101    1   2   2    5
102    2   2   3    7

编辑:

组应由list comprehension创建:

def f(x):
    a, b, c = np.split(x.index, [int(.3*len(x)), int(.6*len(x))])
    L = [a,b,c]
    return [i for i, y in zip(range(len(L),0,-1) ,L) for j in y]

user['Groups'] = user.groupby('User')['User'].transform(f)
print (user)
   User    Country  Count  Groups
0   101      India     85       3
1   101      Japan     78       2
2   101      India     70       2
6   101    Austria     60       1
5   101         UK      8       1
7   102      Japan     30       3
4   102      Japan      6       3
9   102         UK      6       2
3   102     Brazil      5       2
8   102  Singapore      5       1
10  102         UK      5       1
11  102         UK      4       1