字符串拆分为所需的结果

时间:2019-12-19 14:07:56

标签: python string dataframe split

我有一个包含以下格式字符串的列数据框:

0                      D/WB/M (L)
1                    DM, M/AM (C)
2                  D (RC), WB (R)
3                D (C), DM, M (C)
4          M (C), AM (LC), ST (C)

期望的结果如下:

0 [D L, WB L, M L]
1 [DM, M C, AM C]
2 [D R, D C, WB R]
3 [D C, DM, M C]
4 [M C, AM L, AM C, ST C]

已经尝试了一些方法,但是我走的最远距离理想的结果还很远:

a = df['position'].str.split(', ')
i = 0
p_ps = list()
for i in range(len(a)):
    p_ps.append(df['position'][i].split(', '))

i = 0
for i in range(len(p_ps)):
    j = 0
    for j in range(len(p_ps[i])):
        p_ps[i][j] = p_ps[i][j].replace('(','').replace(')','').split(' ')
i = 0
for i in range(len(p_ps)):
    j = 0
    for j in range(len(p_ps[i])):
        try:
            if len(p_ps[i][j][1]) > 1:
                c = list()
                for a in p_ps[i][j][1]:
                    c.append(a)
                p_ps[i][j][1] = c
        except:
            continue
i = 0
for i in range(len(p_ps)):
    j = 0
    for j in range(len(p_ps[i])):
        k = 0
        for k in range(len(p_ps[i][j])):
            if '/' in p_ps[i][j][k]:
                p_ps[i][j][k] = p_ps[i][j][k].split('/')
i = 0
for i in range(len(p_ps)):
    j = 0
    for j in range(len(p_ps[i])):
        if len(p_ps[i][j]) > 1:
            k = 0
            for k in range(len(p_ps[i][j])):
                if not isinstance(p_ps[i][j][k], list):
                    p_ps[i][j] = str(p_ps[i][j][0]) + str(p_ps[i][j][1])

如您所见,这段代码的结果并没有真正实现

1 个答案:

答案 0 :(得分:2)

因此,这适用于您提供的数据。而且,根据注释中提供的信息,它应该可以执行您想要的操作。

import re

def find_elements_in_brackets(str):
    m = re.search('\((.+?)\)', str)
    adder = []
    if m:
        for c in m.group(1):
            adder.append(c)
    return adder

data = ["D/WB/M (L)","DM, M/AM (C)", "D (RC), WB (R)","D (C), DM, M (C)","M (C), AM (LC), ST (C)"]
output = []
for index, row in enumerate(data):
    output.append([])
    for element in row.split(","):
        elements_in_brackets = find_elements_in_brackets(element)
        if elements_in_brackets:
            for splitted in element.split("/"):
                for c in elements_in_brackets:
                    output[index].append((splitted.split("(")[0].strip()+ " "+ c).strip())
        else:
            output[index].append(element.strip())
print(output)

输出:

[['D L', 'WB L', 'M L'], ['DM', 'M C', 'AM C'], ['D R', 'D C', 'WB R'], ['D C', 'DM', 'M C'], ['M C', 'AM L', 'AM C', 'ST C']]