是否有更好的方法(在最小代码意义上)可以执行以下操作:将列转换为枚举数值,因此它应该采用这种方式:
所以这就是我今天做的事情,并想知道是否有人能够以经典的方式做到这一点,所以我可以避免编写函数 get_color_val :
import pandas as pd
cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]})
color_dict = dict(enumerate(set(cars["color"])))
color_dict = dict((y,x) for x,y in color_dict.iteritems())
def get_color_val(row):
my_key = row["color"]
my_value = color_dict.get(my_key)
return my_value
cars["color_val"] = cars.apply(get_color_val, axis=1)
cars = cars.drop("color",1)
print cars
结果
Before------------
car_name color
0 BMW RED
1 BMW RED
2 ACCURA RED
3 ACCURA RED
4 ACCURA GREEN
5 BMW BLACK
6 BMW BLUE
7 BMW BLUE
After------------
car_name color_val
0 BMW 3
1 BMW 3
2 ACCURA 3
3 ACCURA 3
4 ACCURA 2
5 BMW 1
6 BMW 0
7 BMW 0
答案 0 :(得分:3)
在这种情况下我会使用pd.factorize():
In [8]: cars['color_val'] = pd.factorize(cars.color)[0]
In [9]: cars
Out[9]:
car_name color color_val
0 BMW RED 0
1 BMW RED 0
2 ACCURA RED 0
3 ACCURA RED 0
4 ACCURA GREEN 1
5 BMW BLACK 2
6 BMW BLUE 3
7 BMW BLUE 3