在python中将一列信用等级(例如AAA BB CC)转换为AAA = 1,BB = .75等数字类别?

时间:2018-09-12 13:10:49

标签: python categorical-data

我在一个名为“ CREDIT RATING”的数据框中有一个列,用于跨行的许多公司。我需要为评级从AAA到DDD(从1(AAA)到0(DDD))分配一个数字类别。有没有一种快速简单的方法来执行此操作,并基本上创建一个新列,让我按.1的数字获得1-0?谢谢!

3 个答案:

答案 0 :(得分:0)

您可以使用replace:

df['CREDIT RATING NUMERIC'] = df['CREDIT RATING'].replace({'AAA':1, ... , 'DDD':0})

答案 1 :(得分:0)

最简单的方法是简单地创建字典映射:

mymap = {"AAA":1.0, "AA":0.9, ... "DDD":0.0} 

,然后将其应用于数据框:

df["CREDIT MAPPING"] = df["CREDIT RATING"].replace(mymap)

答案 2 :(得分:0)

好吧,虽然有点没什么用,但是我们开始:

# First getting a ratings list acquired from wikipedia than setting into a dataframe to replicate your scenario

ratings = ['AAA' ,'AA1' ,'AA2' ,'AA3' ,'A1' ,'A2' ,'A3' ,'BAA1' ,'BAA2' ,'BAA3' ,'BA1' ,'BA2' ,'BA3' ,'B1' ,'B2' ,'B3' ,'CAA' ,'CA' ,'C' ,'C' ,'E' ,'WR' ,'UNSO' ,'SD' ,'NR']
df_credit_ratings = pd.DataFrame({'Ratings_id':ratings})

df_credit_ratings = pd.concat([df_credit_ratings,df_credit_ratings]) # just to replicate duplicate records

# The set() command get the unique values
unique_ratings = set(df_credit_ratings['Ratings_id'])
number_of_ratings = len(unique_ratings) # counting how many unique there are
number_of_ratings_by_tenth = number_of_ratings/10 # Because from 0 to 1 by 0.1 to 0.1 there are 10 positions.

# the numpy's arange fills values in between from a range (first two numbers) and by which decimals (third number)
dec = list(np.arange(0.0, number_of_ratings_by_tenth, 0.1))

在此之后,您需要将独特的评级与它的优势相结合:

df_ratings_unique = pd.DataFrame({'Ratings_id':list(unique_ratings)}) # list so it gets one value per row

编辑:正如托马斯在另一个答案的评论中所建议的那样,这种类型可能不适合您,因为这并不是评级重要性的真正顺序。因此,您可能需要先创建一个具有顺序的数据框,而无需排序。

df_ratings_unique.sort_values(by='Ratings_id', ascending=True, inplace=True) # sorting so it matches the order of our weigths above. 

恢复解决方案:

df_ratings_unique['Weigth'] = dec # adding the weigths to the DF

df_ratings_unique.set_index('Ratings_id', inplace=True) # setting the Rantings as index to map the values bellow

# now this is the magic, we're creating a new column at the original Dataframe and we'll map according to the `Ratings_id` by our unique dataframe
df_credit_ratings['Weigth'] = df_credit_ratings['Ratings_id'].map(df_ratings_unique.Weigth)