Question

我有一个const sg = require('../src/lib/sendGrid'); const toMe = sg.emptyRequest({ method: 'POST', path: '/v3/mail/send', body: { personalizations: [{ to: [{ email: 'aero@email.com', }], subject: 'I received your registration request!' }], from: { email: 'do-not-reply@sendgrid.me' }, content: [{ type: 'text/plain', value: 'And I will be reviewing your request momentarily.\n\nRegards,\r\nAero' }] } }); sg.API(toMe).then(console.log).catch(console.warn); DataFrame像这样：

df

我希望根据user_id movie_id rating 32236 1 1 5 23171 1 2 3 83307 1 3 4 62631 1 4 3 47638 1 5 3 26184 2 1 4 1333 5 1 4 172 5 2 3 54487 6 1 4 52430 7 4 5 18504 10 1 4 4617 10 4 4对df进行随机分组，每组具有相同数量的user_id（如果不能平均划分，至少要用相似数量的{ {1}}），但不合并user_id行。

例如在每个组中除以2 user_id

user_id

我已经写了一个方法user_id：

     user_id  movie_id  rating
32236   1        1        5
23171   1        2        3
83307   1        3        4
62631   1        4        3
47638   1        5        3
52430   7        4        5

     user_id  movie_id  rating
26184   2        1        4
18504   10       1        4
4617    10       4        4

     user_id  movie_id  rating
1333    5        1        4
172     5        2        3
54487   6        1        4

但不适用于在group(df, n)这样的多行中出现相同的数字。而且我只能设置要划分为多少个组，而不能设置每个划分的组中有多少def group(df, n) : shuffled = df.sample(frac=1) result = np.array_split(shuffled, n) dict = {} for i, part in enumerate(result): dict['df_'+str(i+1)] = part return dict。

如何基于df将user_id分为几组，并且像我提到的那样，每个组都选择了df数量？

Answer 1

首先，获取唯一的用户ID并随机排列它们：

uniques = np.random.permutation(df['user_id'].unique())

拆分唯一ID（假设每个拆分2个ID）：

splits = np.array_split(uniques, len(uniques) // 2)

定义一个函数，该函数将根据索引的值对行进行分组：

def grouper(value):
    return np.argmax([value in split for split in splits])

并使用它：

for group in df.set_index('user_id').groupby(grouper):
    print(group)

Answer 2

据我了解，您可以使用：

d={'df'+str(e):df[df.user_id.isin(i)] for e,i in 
    enumerate(np.array_split(np.random.permutation(df.user_id.unique()),
                                        len(df.user_id.unique())/2))}

输出

{'df0':        user_id  movie_id  rating
 26184        2         1       4
 18504       10         1       4
 4617        10         4       4, 'df1':        user_id  movie_id  rating
 32236        1         1       5
 23171        1         2       3
 83307        1         3       4
 62631        1         4       3
 47638        1         5       3
 52430        7         4       5, 'df2':        user_id  movie_id  rating
 1333         5         1       4
 172          5         2       3
 54487        6         1       4}

您可以在此字典中调用每个键：

print(d['df1'])

       user_id  movie_id  rating
32236        1         1       5
23171        1         2       3
83307        1         3       4
62631        1         4       3
47638        1         5       3
52430        7         4       5

如何将具有相同值的行分组？

2 个答案: