熊猫中多列类别值的映射

时间:2020-02-20 04:43:15

标签: python pandas

假设我有一个包含三列带有分类数据的数据框,并且我想将这三个分类列转换为单个值并映射到原始数据框。我知道使用this的单列是可能的,但是多列呢?

示例:从此

>>>df = pd.DataFrame({'id':['0', '1', '2', '3','4'],
...                   'x':['tall', 'short', 'tall', 'short', 'tall'],
...                   'y':['fat', 'thin', 'thin', 'fat', 'fat'],
...                   'z':['male', 'female', 'female', 'male', 'male']},
...                   dtype='category')

>>>df
  id      x     y       z
0  0   tall   fat    male
1  1  short  thin  female
2  2   tall  thin  female
3  3  short   fat    male
4  4   tall   fat    male
通过映射x,y和z来

为此

>>>df
  id      x     y       z  map
0  0   tall   fat    male    0
1  1  short  thin  female    1
2  2   tall  thin  female    2
3  3  short   fat    male    3
4  4   tall   fat    male    0

1 个答案:

答案 0 :(得分:2)

这是groupby().ngroup()

df['map'] = df.groupby(['x','y','z'], sort=False).ngroup()

或者,如果您的数据是字符串类型,则可以将列连接起来,并可能使用一些特殊字符,然后使用单列方法:

# add('&') may not be needed
df['map'] = pd.factorize(df[['x','y','z']].add('&').sum(1))[0]

输出:

   id      x     y       z  map
0   0   tall   fat    male    0
1   1  short  thin  female    1
2   2   tall  thin  female    2
3   3  short   fat    male    3
4   4   tall   fat    male    0