我有一个包含以下格式字符串的列数据框:
0 D/WB/M (L)
1 DM, M/AM (C)
2 D (RC), WB (R)
3 D (C), DM, M (C)
4 M (C), AM (LC), ST (C)
期望的结果如下:
0 [D L, WB L, M L]
1 [DM, M C, AM C]
2 [D R, D C, WB R]
3 [D C, DM, M C]
4 [M C, AM L, AM C, ST C]
已经尝试了一些方法,但是我走的最远距离理想的结果还很远:
a = df['position'].str.split(', ')
i = 0
p_ps = list()
for i in range(len(a)):
p_ps.append(df['position'][i].split(', '))
i = 0
for i in range(len(p_ps)):
j = 0
for j in range(len(p_ps[i])):
p_ps[i][j] = p_ps[i][j].replace('(','').replace(')','').split(' ')
i = 0
for i in range(len(p_ps)):
j = 0
for j in range(len(p_ps[i])):
try:
if len(p_ps[i][j][1]) > 1:
c = list()
for a in p_ps[i][j][1]:
c.append(a)
p_ps[i][j][1] = c
except:
continue
i = 0
for i in range(len(p_ps)):
j = 0
for j in range(len(p_ps[i])):
k = 0
for k in range(len(p_ps[i][j])):
if '/' in p_ps[i][j][k]:
p_ps[i][j][k] = p_ps[i][j][k].split('/')
i = 0
for i in range(len(p_ps)):
j = 0
for j in range(len(p_ps[i])):
if len(p_ps[i][j]) > 1:
k = 0
for k in range(len(p_ps[i][j])):
if not isinstance(p_ps[i][j][k], list):
p_ps[i][j] = str(p_ps[i][j][0]) + str(p_ps[i][j][1])
如您所见,这段代码的结果并没有真正实现
答案 0 :(得分:2)
因此,这适用于您提供的数据。而且,根据注释中提供的信息,它应该可以执行您想要的操作。
import re
def find_elements_in_brackets(str):
m = re.search('\((.+?)\)', str)
adder = []
if m:
for c in m.group(1):
adder.append(c)
return adder
data = ["D/WB/M (L)","DM, M/AM (C)", "D (RC), WB (R)","D (C), DM, M (C)","M (C), AM (LC), ST (C)"]
output = []
for index, row in enumerate(data):
output.append([])
for element in row.split(","):
elements_in_brackets = find_elements_in_brackets(element)
if elements_in_brackets:
for splitted in element.split("/"):
for c in elements_in_brackets:
output[index].append((splitted.split("(")[0].strip()+ " "+ c).strip())
else:
output[index].append(element.strip())
print(output)
输出:
[['D L', 'WB L', 'M L'], ['DM', 'M C', 'AM C'], ['D R', 'D C', 'WB R'], ['D C', 'DM', 'M C'], ['M C', 'AM L', 'AM C', 'ST C']]