查询数据库中的多个列并查找其总和

时间:2017-08-26 10:28:36

标签: python mysql sql pandas dataframe

我有学校应用程序数据表,看起来像这样......

create table todel (user_id int, SchemesApplicable1 int, SchemesApplicable2 int, 
SchemesApplicable3 int, SchemesApplicable4 int);

insert into todel values (1, 1, 0, 1, 0);

insert into todel values (2, 0, 0, 0, 0);

insert into todel values (3, 1, 0, 1, 0);

insert into todel values (4, 1, 0, 0, 0);

insert into todel values (5, 1, 0, 1, 1);

    SELECT Count(User_Id) as No_Off_Application , 
 sum(if(SchemesApplicable1 = 1, 1, 0)) as first,
sum(if(SchemesApplicable2 = 1, 1, 0))  as second, 
sum(if(SchemesApplicable3 = 1, 1, 0))  as third, 
sum(if(SchemesApplicable4 = 1, 1, 0))  as forth 
FROM todel

以上查询将返回这样的报告......

No_Off_Application  first   second  third   forth
5   4   0   3   1

我想再向已申请多个计划的申请人添加一栏。 预期的数量是3(用户ID' s 1,3和5) 我该如何为此编写查询?

2 个答案:

答案 0 :(得分:1)

SELECT Count(User_Id) as No_Off_Application , 
       sum(SchemesApplicable1) as first,
       sum(SchemesApplicable2) as second,
       sum(SchemesApplicable3) as third,
       sum(SchemesApplicable4) as forth,
       sum(SchemesApplicable1 + SchemesApplicable2 + SchemesApplicable3 + SchemesApplicable4 >= 1) as users_at_least_with_one_application
FROM todel

答案 1 :(得分:1)

这是Pandas的设置:

df = pd.DataFrame([[1, 1, 0, 1, 0], 
                   [2, 0, 0, 0, 0,], 
                   [3, 1, 0, 1, 0], 
                   [4, 1, 0, 0, 0], 
                   [5, 1, 0, 1, 1]], 
              columns=['user_id', 'Scheme1', 'Scheme2', 'Scheme3', 'Scheme4'])
print(df)

   user_id  Scheme1  Scheme2  Scheme3  Scheme4
0        1        1        0        1        0
1        2        0        0        0        0
2        3        1        0        1        0
3        4        1        0        0        0
4        5        1        0        1        1

使用pandas检查每个用户的方案总数,您可以使用df.sum(axis=1)

print(df.iloc[:, 1:].sum(1))

0    2
1    0
2    2
3    1
4    3
dtype: int64

要获得user_ids,您可以使用布尔索引:

user_id_ser = df.user_id[df.iloc[:, 1:].sum(1) > 1]
print(user_id_ser)

0    1
2    3
4    5
Name: user_id, dtype: int64

要添加“标记/指示符”列,您需要使用> 1创建掩码并使用df.astype转换为整数:

df['Schemes > 1'] = (df.iloc[:, 1:].sum(1) > 1).astype(int)
print(df)

   user_id  Scheme1  Scheme2  Scheme3  Scheme4  Schemes > 1
0        1        1        0        1        0            1
1        2        0        0        0        0            0
2        3        1        0        1        0            1
3        4        1        0        0        0            0
4        5        1        0        1        1            1

最后,要获得准确的输出,您可以使用df.where

print(df.where(df > 0).count())

user_id        5
Scheme1        4
Scheme2        0
Scheme3        3
Scheme4        1
Schemes > 1    3
dtype: int64