如何为唯一的列组合赋予唯一的编号,因为每个新的组合编号必须加一。
Sample Input
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B','B'],
'B':['a','a','b','a','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
6 B a
所需的输出
通过对列“ A”和“ B”的值进行分组来计数的新列“ C”。 如下
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 3
4 B a 3
5 B a 3
6 B a 3
答案 0 :(得分:0)
使用ngroup
df['C'] = df.groupby(['A','B']).ngroup()+1
输出
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 3
4 B a 3
5 B a 3
6 B a 3
答案 1 :(得分:0)
ggplot(Test_R_TAU_2_732019,aes(x = X_MA, y = Y_MA, color = cut(time_2, breaks = c(1,108,1001,1121), labels = c("First two seconds","Middle two seconds","Last period")))) + stat_density_2d(aes(fill =..level..), geom = "raster", contour = FALSE) + scale_fill_distiller(palette=8, direction=-1) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme(legend.position='n_one') + geom_point(size=1) + guides(color=guide_legend(title="Time category"))
输出:
Error: (converted from warning) Computation failed in `stat_density2d()`:
missing value where TRUE/FALSE needed
注意:这也适用于两个或多个连续的重复组:
df.loc[df.drop_duplicates().index, 'C'] = 1
df['C'] = df['C'].fillna(0).cumsum().astype(int)
print(df)
输出:
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 3
4 B a 3
5 B a 3
6 B a 3
注意2:上面的情况也是df=pd.DataFrame({'A':['A','A','A','B','B','B','B','C','C','B'],
'B':['a','a','b','a','a','a','a','b','b','a']})
df.loc[df.drop_duplicates().index, 'C'] = 1
df['C'] = df['C'].fillna(0).cumsum().astype(int)
df
方法不会产生递增的'C'值的情况,请参见最后一行的值(3)与上一行的(4):
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 3
4 B a 3
5 B a 3
6 B a 3
7 C b 4
8 C b 4
9 B a 4
输出:
ngroup
更新:
类似于df=pd.DataFrame({'A':['A','A','A','B','B','B','B','C','C','B'],
'B':['a','a','b','a','a','a','a','b','b','a']})
df['C'] = df.groupby(['A','B']).ngroup()+1
df
,但从第一次出现开始计数:
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 3
4 B a 3
5 B a 3
6 B a 3
7 C b 4
8 C b 4
9 B a 3
输出:
ngroup