pandas从列值中保留子字符串

时间:2017-07-10 15:52:16

标签: python pandas

任何人都可以给我一些线索,如何将以下数据帧转换为所需的数据帧(下图):

输入数据帧:

rawdata= {'id': ['json', 'molly', 'tina', 'jake', 'molly'], 'entity': ['present:k:0:mc,present:m:10:mc', 'absent:m:1:pc', 'absent:k:60:pc,absent:k:5:pc', None, 'present:k:5:mc'], 'entity2': ['present:l:300:mc', 'present:k:5:pc,present:m:0:pc', None, 'absent:l:0:pc,absent:k:10:pc', 'absent:m:60:pc']}
df= pd.DataFrame(rawdata)
df.set_index('id')

                               entity                        entity2
id                                                                  
json   present:k:0:mc,present:m:10:mc               present:l:300:mc
molly                   absent:m:1:pc  present:k:5:pc,present:m:0:pc
tina     absent:k:60:pc,absent:k:5:pc                           None
jake                             None   absent:l:0:pc,absent:k:10:pc
molly                  present:k:5:mc                 absent:m:60:pc

所需数据框:

            entity           entity2
id                                                                  
json         0,10              300
molly         1               5,10
tina         60,5             None
jake         None             0,10
molly         5                60

1 个答案:

答案 0 :(得分:0)

你可以尝试这样的事情;用逗号替换所有非数字,然后在字符串的两端删除逗号:

df.apply(lambda col: col.str.replace(r"\D+", ",").str.strip(","))

#     entity    entity2
#id     
#json   0,10        300
#molly     1        5,0
#tina   60,5       None
#jake   None       0,10
#molly     5         60