Question

假设我们有一个名为df的数据框：

我想使用groupby创建以下内容：

1: [a,b,c]
2: [d,e]
1: [f,g]

目前，如果我使用了

的内容

{k: list(v) for k,v in df.groupby("A")["B"]}

我得到了

1: [a,b,c,f,g]
2: [d,e]

我希望分离基于相似和连续的数据。

Answer 1

Series groupby cumsum由shift移位列A创建Best way to access the Nth line of csv file：

print (df["A"].ne(df["A"].shift()).cumsum())
0    1
1    1
2    1
3    2
4    2
5    3
6    3
Name: A, dtype: int32

df = df["B"].groupby(df["A"].ne(df["A"].shift()).cumsum()).apply(list).reset_index()
print (df)
   A          B
0  1  [a, b, c]
1  2     [d, e]
2  3     [f, g]

dict：

d = {k: list(v) for k,v in df['B'].groupby(df["A"].ne(df["A"].shift()).cumsum())}
print (d)
{1: ['a', 'b', 'c'], 2: ['d', 'e'], 3: ['f', 'g']}

d  = df["B"].groupby(df["A"].ne(df["A"].shift()).cumsum()).apply(list).to_dict()
print (d)
{1: ['a', 'b', 'c'], 2: ['d', 'e'], 3: ['f', 'g']}

EDIT1：

df  = df["B"].groupby([df['A'], df["A"].ne(df["A"].shift()).cumsum()]).apply(list)
df = df.groupby(level=0).apply(lambda x: x.tolist() if len(x) > 1 else x.iat[0]).to_dict()
print (df)
{1: [['a', 'b', 'c'], ['f', 'g']], 2: ['d', 'e']}

如何在pandas数据帧的连续相似值上使用groupby？

1 个答案: