我在下面有一个数据框和一个字典,但是如何用字典替换列呢?
data
index occupation_code
0 10
1 16
2 12
3 7
4 1
5 3
6 10
7 7
8 1
9 3
10 4
……
dict1 = {0: 'other',1: 'academic/educator',2: 'artist',3: 'clerical/admin',4: 'college/grad student',5: 'customer service',6: 'doctor/health care',7: 'executive/managerial',8: 'farmer',9: 'homemaker',10: 'K-12student',11: 'lawyer',12: 'programmer',13: 'retired',14: 'sales/marketing',15: 'scientist',16: 'self-employed',17: 'technician/engineer',18: 'tradesman/craftsman',19: 'unemployed',20: 'writer'}
我使用" for"句子进行替换,但它很慢,就像那样:
for i in data.index:
data.loc[i,'occupation_detailed'] = dict1[data.loc[i,'occupation_code']]
由于我的数据包含100万行,如果我只运行1000次,则需要几秒钟。 100万行可能花费半天时间!
那么有更好的方法吗?
非常感谢您的建议!
答案 0 :(得分:7)
使用map
,如果缺少某些值,则获取NaN
:
print (df)
occupation_code
index
0 10
1 16
2 12
3 7
4 1
5 3
6 10
7 7
8 1
9 3
10 4
11 100 <- add missing value 100
df['occupation_code'] = df['occupation_code'].map(dict1)
print (df)
occupation_code
index
0 K-12student
1 self-employed
2 programmer
3 executive/managerial
4 academic/educator
5 clerical/admin
6 K-12student
7 executive/managerial
8 academic/educator
9 clerical/admin
10 college/grad student
11 NaN
另一种解决方案是使用replace
,如果缺少某些值,则为NaN
:
df['occupation_code'] = df['occupation_code'].replace(dict1)
print (df)
occupation_code
index
0 K-12student
1 self-employed
2 programmer
3 executive/managerial
4 academic/educator
5 clerical/admin
6 K-12student
7 executive/managerial
8 academic/educator
9 clerical/admin
10 college/grad student
11 100
答案 1 :(得分:1)
假设@ jezrael的样本数据df
print(df)
occupation_code
index
0 10
1 16
2 12
3 7
4 1
5 3
6 10
7 7
8 1
9 3
10 4
11 100
我建议使用get
中嵌入的字典的lambda
方法。这允许您为不在字典中的内容嵌入默认值。在这种情况下,我返回原始值。
df.occupation_code.map(lambda x: dict1.get(x, x))
index
0 K-12student
1 self-employed
2 programmer
3 executive/managerial
4 academic/educator
5 clerical/admin
6 K-12student
7 executive/managerial
8 academic/educator
9 clerical/admin
10 college/grad student
11 100
Name: occupation_code, dtype: object