如何使用python / panda将字符串拆分为列名称?

时间:2019-03-19 16:16:49

标签: python pandas

您知道如何在python中解决此问题吗?我想有一个数据框,其中的数据排列在正确的列中。

谢谢!

这是来自数据帧的字符串的示例。

'Huidigefuncties迈克尔·乔丹(Michael Jordan)2015年-现任理光荷兰分部市场与间接渠道总监哈佛'

首选结果

type          from     to        function                                   organization           
current       2015     present    Director Marketing & Indirect Channels    Ricoh Nederland 
current       2010     present    Owner & Consultant                        Basketball Center
old           2012     2015       Director Marketing & Business Development Ricoh
school        1988     1992       Marketing                                 Harvard                           

当前df

Name             Data
Michael Jordan   ' Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard '

1 个答案:

答案 0 :(得分:0)

好吧,这是我针对此问题所做的解决方案

import pandas as pd
beautiful_data = 'Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard'
main_dict = {'type':[], 'from':[], 'to':[], 'function':[], 'organization': []}
data = beautiful_data.split(' ')
i = 0
huidi_index = data.index('Huidigefuncties')
loopbaan_index = data.index('Loopbaan')
ople_index = data.index('Opleiding')
# print(data)
while i < len(data):
    if data[i] == 'Huidigefuncties':
        line = ' '.join(data[i + 1: loopbaan_index])
        i = loopbaan_index
        print(line)
        type_data = 'current'
    elif data[i] == 'Loopbaan':
        line = ' '.join(data[i + 1: ople_index])
        i = ople_index
        print(line)
        type_data = 'old'
    elif data[i] == 'Opleiding':
        line = ' '.join(data[i+1: ])
        i = len(data)
        print(line)
        type_data = 'school'
    else:
        i += 1
    data_line = line.split('-')
    if len(data_line) == 2:
        print(type_data)
        main_dict['type'].append(type_data)
        from_data = data_line[0].strip().split(' ')[-1]
        print(from_data)
        main_dict['from'].append(from_data)
        to_data = data_line[1].strip().split(' ')[0]
        print(to_data)
        main_dict['to'].append(to_data)
        function_data = ' '.join(data_line[1].strip().split(' ')[1:-1])[:-1]
        print(function_data)
        main_dict['function'].append(function_data)
        organization_data = data_line[1].split(',')[-1].strip()
        print(organization_data)
        main_dict['organization'].append(organization_data)

    elif len(data_line) > 2:
        j = 0
        while j < len(data_line):
            register_data = data_line[j:j+2]
            if len(register_data) > 1:
                if len(register_data[0].split(' ')) > 1 and len(register_data[1].split(' ')) > 1: 
                    if j == 0:
                        print(register_data)
                        print('----------')
                        print(type_data)
                        main_dict['type'].append(type_data)
                        from_data = register_data[0].strip().split(' ')[-1]
                        print(from_data)
                        main_dict['from'].append(from_data)
                        to_data = register_data[1].strip().split(' ')[0]
                        print(to_data)
                        main_dict['to'].append(to_data)
                        function_org = register_data[1].strip().split(',')
                        function_data = ' '.join(function_org[0].split(' ')[1:])
                        print(function_data)
                        main_dict['function'].append(function_data)
                        org_data = ' '.join(function_org[1].split(' ')[:-1]).strip()
                        print(org_data)
                        main_dict['organization'].append(org_data)
                        print('-----------')
                    else:
                        print('-----------')
                        print(register_data)
                        print(type_data)
                        main_dict['type'].append(type_data)
                        from_data = register_data[0].strip().split(' ')[-1]
                        print(from_data)
                        main_dict['from'].append(from_data)
                        to_data = register_data[1].strip().split(' ')[0]
                        print(to_data)
                        main_dict['to'].append(to_data)
                        function_org = register_data[1].strip().split(',')
                        function_data = ' '.join(function_org[0].split(' ')[1:])
                        print(function_data)
                        main_dict['function'].append(function_data)
                        org_data = ' '.join(function_org[1].split(' ')).strip()
                        print(org_data)
                        main_dict['organization'].append(org_data)
                        print('-----------')
            j += 1

df = pd.DataFrame(main_dict)

经过测试

enter image description here