熊猫系列替换值

时间:2020-02-07 10:14:44

标签: python pandas dataframe series

我有一个熊猫系列,其值如下:

Bachelors Degree         639
Diploma                  291
O - Level                264
Masters Degree           149
Certificate              126
A - Level                 69
PGD                       40
Bachelors Degree          28
A-Level                   20
O-Level                   15
Masters                   10
Bachelors                  6
diploma                    5
certificate                5
Ph.D                       4
A- Level                   2
Post Graduate Diploma      1
Msc Environment            1
BBA                        1
O- Level                   1
Masters                    1
PhD                        1

我从excel中获得数据。

我想用熊猫做数据清理工作,比如说用硕士学位代替所有拥有硕士学位的案例(我可以在excel中做到,但我正在学习熊猫)。

我尝试过

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":"diploma",
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":"certificate",
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)

仅针对只有一个值的证书密钥返回结果。

似乎我不能使用列表作为字典键的值。

任何有关如何替换这些值的建议将受到高度赞赏。 罗纳德 这就是实际数据在excel列中的显示方式。 enter image description here

我在该列中添加了一张数据的图像。 面临的挑战是如何替换“硕士学位”的各种说法。

2 个答案:

答案 0 :(得分:0)

一个想法是将一个元素值转换为一个元素列表,例如将"diploma"转换为["diploma"]

mapp1={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k.lower(): oldk for oldk, oldv in mapp1.items() for k in oldv}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

如果不可能,请使用:

d = {}
for k, v in mapp.items():
    if isinstance(v, list):
        for x in v:
            d[x.lower()] = k
    else:
        d[v.lower()] = k


df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
          EDUCATION_LEVEL  VAL
0       Bachelor's Degree  639
1        Ordinary Diploma  291
2          Ordinary Level  264
3         Master's Degree  149
4             Certificate  126
5          Advanced Level   69
6   Post Graduate Diploma   40
7       Bachelor's Degree   28
8          Advanced Level   20
9          Ordinary Level   15
10        Master's Degree   10
11      Bachelor's Degree    6
12       Ordinary Diploma    5
13            Certificate    5
14                    PHD    4
15                    NaN    2
16  Post Graduate Diploma    1
17        Master's Degree    1
18      Bachelor's Degree    1
19         Ordinary Level    1
20        Master's Degree    1
21                    PHD    1

答案 1 :(得分:0)

首先通过将所有值设置为列表来对您的mapp字典稍作更改:

mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
      "Ordinary Diploma":["diploma"],
      "Ordinary Level":["O - Level","O-Level","O- Level"],
      "Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
      "Certificate":["certificate"],
      "Advanced Level":["A - Level","A-Level","- Level"],
      "Post Graduate Diploma":["Post Graduate Diploma","PGD"],
      "PHD":["Ph.D","PhD"]    
     }

mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))


0         Bachelor's Degree
1          Ordinary Diploma
2            Ordinary Level
3           Master's Degree
4               Certificate
5            Advanced Level
6     Post Graduate Diploma
7         Bachelor's Degree
8            Advanced Level
9            Ordinary Level
10          Master's Degree
11        Bachelor's Degree
12         Ordinary Diploma
13              Certificate
14                      PHD
15                 A- Level
16    Post Graduate Diploma
17          Master's Degree
18        Bachelor's Degree
19           Ordinary Level
20          Master's Degree
21                      PHD