根据字典中的值过滤Pythons Pandas DataFrame

时间:2015-11-11 11:49:07

标签: python pandas

我有一些数据,我已将其作为pandas数据框读入Python:

             Unnamed: 0  Initial_guess  Lower_bound  Upper_bound Estimated_or_Fixed  
      0          Ka              5     0.000001        10000          Estimated   
      2          Kd              5     0.000001        10000          Estimated   
      3          Ki              5     0.000001        10000          Estimated   
      5          Kr              5     0.000001        10000          Estimated   
      6        R1_I              5     0.000001        10000          Estimated   
      7         PR1              5     0.000001        10000          Estimated   
      8         PR2              5     0.000001        10000          Estimated   
      9       alpha              5     0.000001        10000          Estimated   
      10        Kcd              5     0.000001        10000          Estimated   
      12       Klid              5     0.000001        10000          Estimated   
      18    LR1R2_I              5     1.000000        10000          Estimated   

        Variable_type  
0   Kinetic parameter  
2   Kinetic parameter  
3   Kinetic parameter  
5   Kinetic parameter  
6   Kinetic parameter  
7   Kinetic parameter  
8   Kinetic parameter  
9   Kinetic parameter  
10  Kinetic parameter  
12  Kinetic parameter  
18         Species IC  

第一列unnamed: 0是参数。我有很多模型,每个模型包含这些参数的不同组合。我的任务是通过删除模型中不存在参数的任何行来为每个模型过滤此表。我有每个模型的字典及其包含的参数。参数可以有两种类型,species ICkinetic parameter。以下是第一个模型的这些词典的示例:

Species_IC:
{'R1': '2.7109e+02', 'R2': '1.2709e+02', 'R1_I': '2.7109e+03', 'R2_I': '1.2709e+03', 'LR1R2': '1.6913e+00', 'LR1R2_I': '1.6913e+01'}

Kinetic_parameter:
{'Ka': '1.0000e+00', 'TGFb': '1.0000e-01', 'Synth': '1.0000e+00', 'PR1': '8.0000e+00', 'Sink': '0.0000e+00', 'PR2': '4.0000e+00', 'alpha': '1.0000e+00'}

我的代码:

def write_parameter_bounds_file(self):
    model1=self.all_models_dirs[0] #get first model from a list of model. I'll do it on the first model then generalize to the rest. 
    species=self.get_model_species(model1+'.xml') #get the species dct from this model
    parameters=self.get_model_parameters(model1+'.xml')#get parameter dct from this model
    param_info=self.read_parameter_bounds_template() #get all parameters from template. This is the pandas dataframe at the top. 
    estimated_species=[]
    estimated_params=[]
    for i in species.keys():
        print '\n'
        for j in param_info[param_info.columns[0]]:
            if i==j:
                estimated_species.append(i)
    for i in parameters.keys():
        print '\n'
        for j in param_info[param_info.columns[0]]:
            if i==j:
                estimated_params.append(i)
    param_list=estimated_params+estimated_species #This is a list of the parameters that need to be included in the output df

有人知道如何使用param_list来过滤原始的pandas df吗?

由于

1 个答案:

答案 0 :(得分:4)

您可以将函数isin与从字典生成的列表一起使用:

list_Species_IC = Species_IC.keys()

获取数据框df的子集。您可以按功能reset_index重置索引。

类似的方法可以用于字典Kinetic_parameter

Species_IC = {'R1': '2.7109e+02', 'R2': '1.2709e+02', 'R1_I': '2.7109e+03', 'R2_I': '1.2709e+03', 'LR1R2': '1.6913e+00', 'LR1R2_I': '1.6913e+01'}

list_Species_IC = Species_IC.keys()
print list_Species_IC
#['R1', 'R2', 'R1_I', 'R2_I', 'LR1R2', 'LR1R2_I']
out = df[df['Unnamed: 0'].isin(list_Species_IC)].reset_index()
print out
#   Unnamed: 0  Initial_guess  Lower_bound  Upper_bound Estimated_or_Fixed
#4        R1_I              5     0.000001        10000          Estimated
#10    LR1R2_I              5     1.000000        10000          Estimated

所有在一起:

Species_IC = {'R1': '2.7109e+02', 'R2': '1.2709e+02', 'R1_I': '2.7109e+03', 'R2_I': '1.2709e+03', 'LR1R2': '1.6913e+00', 'LR1R2_I': '1.6913e+01'}
Kinetic_parameter = {'Ka': '1.0000e+00', 'TGFb': '1.0000e-01', 'Synth': '1.0000e+00', 'PR1': '8.0000e+00', 'Sink': '0.0000e+00', 'PR2': '4.0000e+00', 'alpha': '1.0000e+00'}

list_Species_IC = Species_IC.keys()
list_Kinetic_parameter = Kinetic_parameter.keys()
list_IC = list_Species_IC + list_Kinetic_parameter
print list_IC
#['R1', 'R2', 'R1_I', 'R2_I', 'LR1R2', 'LR1R2_I', 'Ka', 'TGFb', 'Synth', 'PR1', 'Sink', 'PR2', 'alpha']
out = df[df['Unnamed: 0'].isin(list_IC)].reset_index()
print out
#   index Unnamed: 0  Initial_guess  Lower_bound  Upper_bound  \
#0      0         Ka              5     0.000001        10000   
#1      4       R1_I              5     0.000001        10000   
#2      5        PR1              5     0.000001        10000   
#3      6        PR2              5     0.000001        10000   
#4      7      alpha              5     0.000001        10000   
#5     10    LR1R2_I              5     1.000000        10000   
#
#  Estimated_or_Fixed  
#0          Estimated  
#1          Estimated  
#2          Estimated  
#3          Estimated  
#4          Estimated  
#5          Estimated