将列的数据转换为枚举的字典键值

时间:2016-09-27 17:02:17

标签: python pandas dictionary dataframe enumeration

是否有更好的方法(在最小代码意义上)可以执行以下操作:将列转换为枚举数值,因此它应该采用这种方式:

  1. 在列
  2. 中获取设置项目
  3. 使用键值
  4. 制作已填充字典
  5. 使用值
  6. 还原键
  7. 使用键值结果而不是新列中的数据。
  8. 所以这就是我今天做的事情,并想知道是否有人能够以经典的方式做到这一点,所以我可以避免编写函数 get_color_val

    import pandas as pd  
    cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]})
    
    color_dict = dict(enumerate(set(cars["color"])))
    color_dict = dict((y,x) for x,y in color_dict.iteritems())
    
    def get_color_val(row):
        my_key = row["color"]
        my_value = color_dict.get(my_key)
        return my_value
    
    cars["color_val"] = cars.apply(get_color_val, axis=1)
    cars = cars.drop("color",1)
    print cars
    
      

    结果

    Before------------
    car_name  color
    0      BMW    RED
    1      BMW    RED
    2   ACCURA    RED
    3   ACCURA    RED
    4   ACCURA  GREEN
    5      BMW  BLACK
    6      BMW   BLUE
    7      BMW   BLUE
    
    
    After------------
    car_name  color_val
    0      BMW          3
    1      BMW          3
    2   ACCURA          3
    3   ACCURA          3
    4   ACCURA          2
    5      BMW          1
    6      BMW          0
    7      BMW          0
    

1 个答案:

答案 0 :(得分:3)

在这种情况下我会使用pd.factorize()

In [8]: cars['color_val'] = pd.factorize(cars.color)[0]

In [9]: cars
Out[9]:
  car_name  color  color_val
0      BMW    RED          0
1      BMW    RED          0
2   ACCURA    RED          0
3   ACCURA    RED          0
4   ACCURA  GREEN          1
5      BMW  BLACK          2
6      BMW   BLUE          3
7      BMW   BLUE          3