熊猫中的新出现计数

时间:2019-03-09 10:12:18

标签: python-3.x pandas

如何为唯一的列组合赋予唯一的编号,因为每个新的组合编号必须加一。

Sample Input

import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B','B'],
                'B':['a','a','b','a','a','a','a'],
                })

df

    A   B
0   A   a 
1   A   a 
2   A   b 
3   B   a 
4   B   a 
5   B   a
6   B   a

所需的输出

通过对列“ A”和“ B”的值进行分组来计数的新列“ C”。 如下

   A   B  C
0   A   a  1
1   A   a  1
2   A   b  2
3   B   a  3
4   B   a  3
5   B   a  3
6   B   a  3

2 个答案:

答案 0 :(得分:0)

使用ngroup

df['C'] = df.groupby(['A','B']).ngroup()+1

输出

   A  B  C
0  A  a  1
1  A  a  1
2  A  b  2
3  B  a  3
4  B  a  3
5  B  a  3
6  B  a  3

答案 1 :(得分:0)

ggplot(Test_R_TAU_2_732019,aes(x = X_MA, y = Y_MA, color =  cut(time_2, breaks = c(1,108,1001,1121), labels = c("First two seconds","Middle two seconds","Last period")))) + stat_density_2d(aes(fill =..level..), geom = "raster", contour = FALSE) + scale_fill_distiller(palette=8, direction=-1) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme(legend.position='n_one') + geom_point(size=1) + guides(color=guide_legend(title="Time category"))

输出:

Error: (converted from warning) Computation failed in `stat_density2d()`:
missing value where TRUE/FALSE needed

注意:这也适用于两个或多个连续的重复组:

df.loc[df.drop_duplicates().index, 'C'] = 1
df['C'] = df['C'].fillna(0).cumsum().astype(int)
print(df)

输出:

   A  B  C
0  A  a  1
1  A  a  1
2  A  b  2
3  B  a  3
4  B  a  3
5  B  a  3
6  B  a  3

注意2:上面的情况也是df=pd.DataFrame({'A':['A','A','A','B','B','B','B','C','C','B'], 'B':['a','a','b','a','a','a','a','b','b','a']}) df.loc[df.drop_duplicates().index, 'C'] = 1 df['C'] = df['C'].fillna(0).cumsum().astype(int) df 方法不会产生递增的'C'值的情况,请参见最后一行的值(3)与上一行的(4):

   A  B  C
0  A  a  1
1  A  a  1
2  A  b  2
3  B  a  3
4  B  a  3
5  B  a  3
6  B  a  3
7  C  b  4
8  C  b  4
9  B  a  4

输出:

ngroup

更新

类似于df=pd.DataFrame({'A':['A','A','A','B','B','B','B','C','C','B'], 'B':['a','a','b','a','a','a','a','b','b','a']}) df['C'] = df.groupby(['A','B']).ngroup()+1 df ,但从第一次出现开始计数:

   A  B  C
0  A  a  1
1  A  a  1
2  A  b  2
3  B  a  3
4  B  a  3
5  B  a  3
6  B  a  3
7  C  b  4
8  C  b  4
9  B  a  3

输出:

ngroup