假设我有一个包含三列带有分类数据的数据框,并且我想将这三个分类列转换为单个值并映射到原始数据框。我知道使用this的单列是可能的,但是多列呢?
示例:从此
>>>df = pd.DataFrame({'id':['0', '1', '2', '3','4'],
... 'x':['tall', 'short', 'tall', 'short', 'tall'],
... 'y':['fat', 'thin', 'thin', 'fat', 'fat'],
... 'z':['male', 'female', 'female', 'male', 'male']},
... dtype='category')
>>>df
id x y z
0 0 tall fat male
1 1 short thin female
2 2 tall thin female
3 3 short fat male
4 4 tall fat male
通过映射x,y和z来为此
>>>df
id x y z map
0 0 tall fat male 0
1 1 short thin female 1
2 2 tall thin female 2
3 3 short fat male 3
4 4 tall fat male 0
答案 0 :(得分:2)
这是groupby().ngroup()
:
df['map'] = df.groupby(['x','y','z'], sort=False).ngroup()
或者,如果您的数据是字符串类型,则可以将列连接起来,并可能使用一些特殊字符,然后使用单列方法:
# add('&') may not be needed
df['map'] = pd.factorize(df[['x','y','z']].add('&').sum(1))[0]
输出:
id x y z map
0 0 tall fat male 0
1 1 short thin female 1
2 2 tall thin female 2
3 3 short fat male 3
4 4 tall fat male 0