我有一个熊猫系列,其值如下:
Bachelors Degree 639
Diploma 291
O - Level 264
Masters Degree 149
Certificate 126
A - Level 69
PGD 40
Bachelors Degree 28
A-Level 20
O-Level 15
Masters 10
Bachelors 6
diploma 5
certificate 5
Ph.D 4
A- Level 2
Post Graduate Diploma 1
Msc Environment 1
BBA 1
O- Level 1
Masters 1
PhD 1
我从excel中获得数据。
我想用熊猫做数据清理工作,比如说用硕士学位代替所有拥有硕士学位的案例(我可以在excel中做到,但我正在学习熊猫)。
我尝试过
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":"diploma",
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":"certificate",
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)
仅针对只有一个值的证书密钥返回结果。
似乎我不能使用列表作为字典键的值。
任何有关如何替换这些值的建议将受到高度赞赏。 罗纳德 这就是实际数据在excel列中的显示方式。
我在该列中添加了一张数据的图像。 面临的挑战是如何替换“硕士学位”的各种说法。
答案 0 :(得分:0)
一个想法是将一个元素值转换为一个元素列表,例如将"diploma"
转换为["diploma"]
:
mapp1={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k.lower(): oldk for oldk, oldv in mapp1.items() for k in oldv}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
如果不可能,请使用:
d = {}
for k, v in mapp.items():
if isinstance(v, list):
for x in v:
d[x.lower()] = k
else:
d[v.lower()] = k
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
答案 1 :(得分:0)
首先通过将所有值设置为列表来对您的mapp字典稍作更改:
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))
0 Bachelor's Degree
1 Ordinary Diploma
2 Ordinary Level
3 Master's Degree
4 Certificate
5 Advanced Level
6 Post Graduate Diploma
7 Bachelor's Degree
8 Advanced Level
9 Ordinary Level
10 Master's Degree
11 Bachelor's Degree
12 Ordinary Diploma
13 Certificate
14 PHD
15 A- Level
16 Post Graduate Diploma
17 Master's Degree
18 Bachelor's Degree
19 Ordinary Level
20 Master's Degree
21 PHD