我有一个CSV文件,其中的单个“ id”具有多个复杂的“值”,并且 我希望多个值相对于其“ id”分成不同的行。
我的CSV文件:
# To read df1=pandas.read_csv('krish.csv',encoding="ISO-8859-1")
# File have data even like 1.50% (P,KR,AU) 0.2¢/kg (AX,AU)
id value
100.3 Free (A+,BH,CA) 0.1¢/kg (AX)
200.1 Free (MA, MX,OM)
321.5 Free (BH,CA) 1.70% (P) 7% (PE) 12.3% (KR)
我想要以上输入的输出:
答案 0 :(得分:1)
I'm pretty sure there are more efficient/elegant ways, but this should work
def split_elements(s):
elements = s[s.find('(')+1:-1].split(',')
key = s[:s.find('(')]
return ['{} ({})'.format(key, el) for el in elements]
input_data = {'values': ['Free (A+,BH,CA) 0.1¢/kg (AX)', 'Free (MA, MX,OM)', 'Free (BH,CA) 1.70% (P) 7% (PE) 12.3% (KR)'], 'ids': [100.3, 200.1, 321.5]}
df = pd.DataFrame(input_data)
temp_values = []
temp_ids = []
# iterate through rows
for idr, r in df.iterrows():
# extract elements
elements = [el.strip()+')' for el in r['values'].split(')') if el != '']
# split subelements
for element in elements:
split_el = split_elements(element)
temp_values.extend(split_el)
temp_ids.extend([r['ids']]*len(split_el))
# create dataset
df1 = pd.DataFrame({'ids': temp_ids, 'values': temp_values})
df1.set_index('ids')
Which gives
ids values
100.3 Free (A+)
100.3 Free (BH)
100.3 Free (CA)
100.3 0.1¢/kg (AX)
200.1 Free (MA)
200.1 Free ( MX)
200.1 Free (OM)
321.5 Free (BH)
321.5 Free (CA)
321.5 1.70% (P)
321.5 7% (PE)
321.5 12.3% (KR)