如何用数字关键字dict替换纯数字列? [蟒蛇]

时间:2017-04-22 14:49:34

标签: python pandas dictionary

我在下面有一个数据框和一个字典,但是如何用字典替换列呢?

data
index     occupation_code
0          10
1          16
2          12
3           7
4           1
5           3
6          10
7           7
8           1
9           3
10          4
……

dict1 = {0: 'other',1: 'academic/educator',2: 'artist',3: 'clerical/admin',4: 'college/grad student',5: 'customer service',6: 'doctor/health care',7: 'executive/managerial',8: 'farmer',9: 'homemaker',10: 'K-12student',11: 'lawyer',12: 'programmer',13: 'retired',14: 'sales/marketing',15: 'scientist',16: 'self-employed',17: 'technician/engineer',18: 'tradesman/craftsman',19: 'unemployed',20: 'writer'}

我使用" for"句子进行替换,但它很慢,就像那样:

for i in data.index:
  data.loc[i,'occupation_detailed'] = dict1[data.loc[i,'occupation_code']]

由于我的数据包含100万行,如果我只运行1000次,则需要几秒钟。 100万行可能花费半天时间!

那么有更好的方法吗?

非常感谢您的建议!

2 个答案:

答案 0 :(得分:7)

使用map,如果缺少某些值,则获取NaN

print (df)
       occupation_code
index                 
0                   10
1                   16
2                   12
3                    7
4                    1
5                    3
6                   10
7                    7
8                    1
9                    3
10                   4
11                 100 <- add missing value 100
df['occupation_code'] = df['occupation_code'].map(dict1)
print (df)
            occupation_code
index                      
0               K-12student
1             self-employed
2                programmer
3      executive/managerial
4         academic/educator
5            clerical/admin
6               K-12student
7      executive/managerial
8         academic/educator
9            clerical/admin
10     college/grad student
11                      NaN

另一种解决方案是使用replace,如果缺少某些值,则为NaN

df['occupation_code'] = df['occupation_code'].replace(dict1)
print (df)
            occupation_code
index                      
0               K-12student
1             self-employed
2                programmer
3      executive/managerial
4         academic/educator
5            clerical/admin
6               K-12student
7      executive/managerial
8         academic/educator
9            clerical/admin
10     college/grad student
11                      100

答案 1 :(得分:1)

假设@ jezrael的样本数据df

print(df)

       occupation_code
index                 
0                   10
1                   16
2                   12
3                    7
4                    1
5                    3
6                   10
7                    7
8                    1
9                    3
10                   4
11                 100

我建议使用get中嵌入的字典的lambda方法。这允许您为不在字典中的内容嵌入默认值。在这种情况下,我返回原始值。

df.occupation_code.map(lambda x: dict1.get(x, x))

index
0              K-12student
1            self-employed
2               programmer
3     executive/managerial
4        academic/educator
5           clerical/admin
6              K-12student
7     executive/managerial
8        academic/educator
9           clerical/admin
10    college/grad student
11                     100
Name: occupation_code, dtype: object