Question

我有以下熊猫数据框：

我想将值存储在另一个数据帧中，例如每组连续的相同值组成这样的标记组：

A列表示组的值，B列表示发生的次数。

这是我到目前为止所做的：

df = pd.DataFrame({'a':[0,0,1,2,2,2,3,2,2,1]})
df2 = pd.DataFrame()
for i,g in df.groupby([(df.a != df.a.shift()).cumsum()]):
    vc = g.a.value_counts()
    df2 = df2.append({'A':vc.index[0], 'B': vc.iloc[0]}, ignore_index=True).astype(int)

它可以工作，但是有点混乱。

您认为这样做的最短/更好的方法吗？

Answer 1

我会尝试：

df['blocks'] = df['a'].ne(df['a'].shift()).cumsum()
(df.groupby(['a','blocks'],
           as_index=False,
           sort=False)
   .count()
   .drop('blocks', axis=1)
)

输出：

Answer 2

在熊猫> 0.25.0 中使用GrouBy.agg ：

new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False) .agg(A=('a','first'),B=('a','count')) ) print(new_df)

A B 0 0 2 1 1 1 2 2 3 3 3 1 4 2 2 5 1 1

熊猫<0.25.0

new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False) .a .agg({'A':'first','B':'count'}) )

在pandas DataFrame中将相同的连续值分组

2 个答案: