我在pandas Dataframe中有以下数据集。
group_id sub_group_id 0 0 0 1 1 0 2 0 2 1 2 2 3 0 3 0
但是我想要那些组ID并形成一个合并的组ID
group_id sub_group_id consolidated_group_id 0 0 0 0 1 1 1 0 2 2 0 3 2 1 4 2 2 5 2 2 5 3 0 6 3 0 6
是否有任何通用或数学方法可以做到这一点?
答案 0 :(得分:1)
您需要将值转换为tuples
,然后使用factorize
:
df['consolidated_group_id'] = pd.factorize(df.apply(tuple,axis=1))[0]
print (df)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6
Numpy解决方案有点修改this answer - 按[::-1]
更改排序,并按[0]
选择返回数组(numpy.unique
):
a = df.values
def unique_return_inverse_2D(a): # a is array
a1D = a.dot(np.append((a.max(0)+1)[:0:-1].cumprod()[::-1],1))
return np.unique(a1D, return_inverse=1)[::-1][0]
def unique_return_inverse_2D_viewbased(a): # a is array
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
return np.unique(a.view(void_dt).ravel(), return_inverse=1)[::-1][0]
df['consolidated_group_id'] = unique_return_inverse_2D(a)
df['consolidated_group_id1'] = unique_return_inverse_2D_viewbased(a)
print (df)
group_id sub_group_id consolidated_group_id consolidated_group_id1
0 0 0 0 0
1 0 1 1 1
2 1 0 2 2
3 2 0 3 3
4 2 1 4 4
5 2 2 5 5
6 3 0 6 6
7 3 0 6 6
答案 1 :(得分:1)
cols = ['group_id', 'sub_group_id']
df.assign(
consolidated_group_id=pd.factorize(
pd.Series(list(zip(*df[cols].values.T.tolist())))
)[0]
)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6