df =
columnA columnAB
row1 xxx
row2 yyy
row3 zzz
row4 xyx
expected df =
columnA columnAB columnB
row1 xxx [('A1(80)', ['BB11', 'A11', 'A21']), ('B1(70)', ['CC55', 'HH21']), ('C1(60)', ['KK88'])]
row2 yyy
row3 zzz
row4 xyx
from collections import defaultdict
d1, d2 = defaultdict(int), defaultdict(list)
d = {'A1BB11': 10,
'B1CC55': 20,
'A1A11': 30,
'A1A21': 40,
'B1HH21': 50,
'C1KK88': 60
}
for k, v in d.items():
prefix = k[:2]
d1[prefix] += v
d2[prefix].append(k[2:])
final = {'{}({})'.format(k, d1[k]): v for k, v in d2.items()}
print(final)
# {'A1(80)': ['BB11', 'A11', 'A21'],
# 'B1(70)': ['CC55', 'HH21'],
# 'C1(60)': ['KK88']}
现在我需要将此数据加载到数据框单元格中,我尝试将其转换为列表并加载
final = sorted(final.items())
# type(final) = <class 'list'>
df.loc[df['columnA'].str.contains('row1', na=False), 'columnB'] = final
但是我遇到了错误
ValueError:设置具有序列的数组元素
您能建议一种解决方法