Question

我有一个熊猫数据框，格式如下：

import pandas as pd
p = pd.DataFrame({"int" : [1,     1,     1,     1,     2,      2],
                  "cod" : [[1,1], [2,2], [1,2], [3,9], [2,2], [2,2]]})

我想按int分组，这给了我很多列表。然后，我想展平这些列表，所以最终我得到一个具有以下形式的数据框：

p = pd.DataFrame({"int" :  [1,                2],
                  "cod" : [[1,1,2,2,1,2,3,9], [2,2,2,2]]})

这是我到目前为止所拥有的：

p.groupby("int", as_index=False)["cod"]

在按int分组后，我一直坚持如何展平

Answer 1

使用sum：

df = p.groupby("int", as_index=False)["cod"].sum()

或list comprehension：

df = p.groupby("int")["cod"].apply(lambda x: [z for y in x for z in y]).reset_index()

df = p.groupby("int")["cod"].apply(lambda x: np.concatenate(x.values).tolist()).reset_index()

对于性能，如果大型列表应该最快：

from itertools import chain

df = p.groupby("int")["cod"].apply(lambda x: list(chain.from_iterable(x))).reset_index()

查看有关flattening lists的更多信息。

print (df)
   int                       cod
0    1  [1, 1, 2, 2, 1, 2, 3, 9]
1    2              [2, 2, 2, 2]

分组和拼合列表

1 个答案: