Question

有2列df

goods_id         int64
properties_id    int64
dtype: object

df
      goods_id  properties_id
    0   3588    1
    1   3588    2
    2   3588    3
    3   3588    4
    4   3588    5
    5   3588    6
    6   3589    1
    7   3589    2
    8   3589    3

需要将properties_ids行合并到每个组的整数列表中。换句话说，每个group_id 3588 [1,2,3,4,5,6]，3589 [1,2,3]等的期望输出。要获得它，我使用了基于','.join的串联连接的自写合并功能。结果不是我期望的。无法了解结果的行为

def combine(x):
    return ','.join(x)

df.groupby('goods_id').apply(combine)

goods_id
3588    goods_id,properties_id # desired output [1,2,3,4,5,6]
3589    goods_id,properties_id # desired output [1,2,3]

使用df.groupby('goods_id')['properties_id'].apply(combine)给我TypeError: sequence item 0: expected str instance, int found

Answer 1

一行：

df.groupby('goods_id').agg(lambda col: col.tolist()).reset_index()

给出以下数据框：

   goods_id       properties_id
0      3588  [1, 2, 3, 4, 5, 6]
1      3589           [1, 2, 3]

如果数据框中有更多列，它们也将聚合到列表中。如果是这种情况，而您只希望properties_id成为列表，则只需在.agg()中指定此列：

df.groupby('goods_id').agg({'properties_id': lambda col: col.tolist()}).reset_index()

在熊猫中通过groupby将int64合并为整数列表

1 个答案: