我正在尝试替换数据集中的列的值...但是我在列中的某些值上遇到问题

时间:2019-11-22 06:06:14

标签: replace

High school graduate                      72370
 Some college but no degree                41683
 Bachelors degree(BA AB BS)                29541
 Children                                  21116
 7th and 8th grade                         12073
 10th grade                                11314
 11th grade                                10367
 Masters degree(MA MS MEng MEd MSW MBA)     9751
 9th grade                                  9262
 Associates degree-occup /vocational        8026
 Associates degree-academic program         6434
 5th or 6th grade                           4986
 12th grade no diploma                      3258
 1st 2nd 3rd or 4th grade                   2697
 Prof school degree (MD DDS DVM LLB JD)     2534
 Doctorate degree(PhD EdD)                  1826
 Less than 1st grade                        1228

这些都是列中的所有值及其计数...

我尝试了以下函数从值中删除方括号和其中的内容- 1)学士学位(BA AB BS) 2)硕士学位(硕士,硕士,硕士) 3)专业学位(MD DDS DVM LLB JD) 4)博士学位(博士学位)

这是我的功能-

def Clean_names(education_names):
    if re.search('\(.*', education_names):
        pos = re.search('\(.*', education_names).start()
        return education_names[:pos]
    else:
        return education_names

运行此功能并将其应用于我的专栏后,我能够摆脱这些括号。 这是输出:

High school graduate                   72370
 Some college but no degree             41683
 Bachelors degree                       29541
 Children                               21116
 7th and 8th grade                      12073
 10th grade                             11314
 11th grade                             10367
 Masters degree                          9751
 9th grade                               9262
 Associates degree-occup /vocational     8026
 Associates degree-academic program      6434
 5th or 6th grade                        4986
 12th grade no diploma                   3258
 1st 2nd 3rd or 4th grade                2697
 Prof school degree                      2534
 Doctorate degree                        1826
 Less than 1st grade                     1228

但是当我尝试根据这些值创建箱时,我遇到了问题...

这是代码:

dataout2.replace({
        'High school graduate' : 'high-school-graduate',
        'Some college but no degree' : 'high-school-graduate',
        "Bachelors degree" : 'undergraduate',
        'Children' : 'children',
        '7th and 8th grade' : 'children',
        '10th grade' : 'high-school',
        '11th grade' : 'high-school',
        'Masters degree' : 'postgraduate',
        '9th grade' : 'high-school',
        'Associates degree-occup /vocational' : 'undergraduate',
        'Associates degree-academic program' : 'undergraduate',
        '5th or 6th grade' : 'children',
        '12th grade no diploma' : 'high-school',
        '1st 2nd 3rd or 4th grade' : 'children',
        'Prof school degree' : 'postgraduate',
        'Doctorate degree' : 'postgraduate',
        'Less than 1st grade' : 'children'},inplace = True , regex = True)

它给我的输出是这样的:

high-school-graduate    114053
 undergraduate            44001
 children                 42100
 high-school              34201
 postgraduate             11577
 postgraduate              2534

我不知道为什么我要获得两个研究生课程...有人可以告诉我我在哪里搞砸了吗?

0 个答案:

没有答案