Question

我的数据框：

  ordercode             quantity
PMC21-AA1U1FBWBJA           1
PMP23-GR1M1FB3CJ            1
PMC11-AA1U1FJWWJA           1
PMC11-AA1U1FBWWJA+I7        2
PMC11-AA1U1FJWWJA           3
PMC11-AA1L1FJWWJA           3

我想要的输出：

  Group    ordercode                quantity
    0       PMC21-AA1U1FBWBJA           1
            PMP23-GR1M1FB3CJ            1
             PMC11-AA1U1FJWWJA          1
            PMC11-AA1U1FBWWJA+I7        1
    1      PMC11-AA1U1FBWWJA+I7         1
            PMC11-AA1U1FJWWJA           3
    2      PMC11-AA1L1FJWWJA            3

因此，我的期望结果基于column['quantity']。数量的最大值是4。

在group0 & group1中，总值为（1 + 1 + 1 + 1 = 4）（1 + 3 = 4）（即，保持最大数量价值为4）。在group2中，我们看不到要添加的值，因此该组由剩余的组成（这里是3）。在group0 & group1中，我们可以看到PMC11-AA1U1FBWWJA+I7的价值分裂。

我从论坛上得到的帮助很少，并进行了以下编码：

df = pd.DataFrame(np.concatenate(df.apply(lambda x: [x[0]] * x[1], 1).as_matrix()), 
              columns=['ordercode'])
df['quantity'] = 1
df['group'] = sorted(range(0, len(df)/3, 1) * 4)[0:len(df)]
df.groupby(['group', 'ordercode']).sum()

但是我遇到了错误。

Type error: 'float' object cannot be interpreted as an integer

如果我在

中使用int

df['group'] = sorted(range(0, int(len(df)/3), 1) * 4)[0:len(df)]

再次

我得到typeerror。谁能告诉我为什么？

Answer 1

假设您使用Python3，键入双斜杠//进行整数除法并将范围转换为列表。

df['group'] = sorted(list(range(0, len(df) // 3, 1)) * 4)[0:len(df)]

第二次尝试是这样的。

df['group'] = sorted(list(range(0, int(len(df) / 3), 1)) * 4)[0:len(df)]

因此完整的代码将像这样运行。在运行之前，我已将您的示例复制到剪贴板。

import pandas as pd
import numpy as np
df = pd.read_clipboard()
df = pd.DataFrame(np.concatenate(df.apply(lambda x: [x[0]] * x[1], 1).as_matrix()), 
              columns=['ordercode'])
df['quantity'] = 1
df['group'] = sorted(list(range(0, len(df) // 3, 1)) * 4)[0:len(df)]
df = df.groupby(['group', 'ordercode']).sum()
print(df)

您使用的“ .as_matrix（）”命令会生成警告，但可以。

使用Python分组数据框

1 个答案: