任何人都可以给我一些线索,如何将以下数据帧转换为所需的数据帧(下图):
输入数据帧:
rawdata= {'id': ['json', 'molly', 'tina', 'jake', 'molly'], 'entity': ['present:k:0:mc,present:m:10:mc', 'absent:m:1:pc', 'absent:k:60:pc,absent:k:5:pc', None, 'present:k:5:mc'], 'entity2': ['present:l:300:mc', 'present:k:5:pc,present:m:0:pc', None, 'absent:l:0:pc,absent:k:10:pc', 'absent:m:60:pc']}
df= pd.DataFrame(rawdata)
df.set_index('id')
entity entity2
id
json present:k:0:mc,present:m:10:mc present:l:300:mc
molly absent:m:1:pc present:k:5:pc,present:m:0:pc
tina absent:k:60:pc,absent:k:5:pc None
jake None absent:l:0:pc,absent:k:10:pc
molly present:k:5:mc absent:m:60:pc
所需数据框:
entity entity2
id
json 0,10 300
molly 1 5,10
tina 60,5 None
jake None 0,10
molly 5 60
答案 0 :(得分:0)
你可以尝试这样的事情;用逗号替换所有非数字,然后在字符串的两端删除逗号:
df.apply(lambda col: col.str.replace(r"\D+", ",").str.strip(","))
# entity entity2
#id
#json 0,10 300
#molly 1 5,0
#tina 60,5 None
#jake None 0,10
#molly 5 60