Pandas - 如何根据多列条件创建具有3个输出的列

时间:2017-02-08 19:28:29

标签: python pandas

我有一个DataFrame df:

def fake_data():
     return{'Name': fake.name(), 
         'Gender': random.choice(sex_list),
         'Address': fake.street_address(), 
         'Nationality': 'Zimbabwean', 
         'Account_Type': random.choice(accounts_list), 
         'Age': random.randint(0, 2), 
         'Education': random.random() > 0.5, 
         'Employment': random.randint(0, 2),
         'Salary': random.randint(0, 2),
         'Employer_Stability': random.random() > 0.5,
         'Consistency': random.random() > 0.5,
         'Balance': random.randint(0, 2),
         'Residential_Status': random.random() > 0.5
      }

我想根据列的条件创建一个0或1或2的列Service_Level;

columns = ['Age','Education', 'Employment', 'Salary', 'Employer_Stability', 'Consistency', 'Balance', 'Residential_Status']

我在这里阅读了一些答案后尝试使用以下内容创建['Service_Level'] = 0;

df['Service_Level'] = np.where((df['Age']==0)&(df['Education']==False)&(df['Employment']==0)&(df['Salary']==0)&(df['Employer_Stability']==False)&(df['Consistency']==False)&(df['Balance']==0)&(df['Residential_Status']==False),
                               (df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 0)

然后这是['Service_Level'] = 1

df['Service_Level'] = np.where((df['Age']==1)&(df['Education']==True)&(df['Employment']==1)&(df['Salary']==1)&(df['Employer_Stability']==False)&(df['Consistency']==True)&(df['Balance']==1)&(df['Residential_Status']==True),
                               (df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 1)

然后这是['Service_Level'] = 2

df['Service_Level'] = np.where((df['Age']==2)&(df['Education']==True)&(df['Employment']==2)&(df['Salary']==2)&(df['Employer_Stability']==True)&(df['Consistency']==True)&(df['Balance']==2)&(df['Residential_Status']==True),
                               (df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 2)

不幸的是,我无法弄清楚如何加入这些条件,以便得到0或1或2。

如果有效,那些不符合这些条件的州会发生什么?我想再生产和输出

1 个答案:

答案 0 :(得分:0)

您可能需要将切片与np.where一起使用(顺便说一下,这需要三个参数,条件,val1(如果condion为真),val2)

您的第一个声明

df['Service_Level'] = np.where(condtion_1, 0, 1)

这将导致df ['Service_Level']为满足第一个条件的行为0,否则为1。

现在,您屏蔽数据以仅获取service_level不为0的行

df[df['Service_Level'] !=0] 

在此数据框架上,您可以使用

应用第二个条件
np.where(condition_2, 1,2) 

将1分配给df ['Service_Level'],条件为真,并为其余行分配2。

编辑:

你可以在第一个中使用带有第二个条件的np.where,就像这样。

df['Service_Level'] = np.where(cond_1, 0, (np.where(cond_2, 1,2)))

为了更好的可读性,您可能希望首先将条件保存为cond_1等,并在np.where中使用它们