具有特定排除条件的Hive查询

时间:2017-05-31 13:21:23

标签: sql hive hiveql

我正在尝试构建仅执行以下功能或这些功能组合的配置单元查询。例如,功能包括

name =“summary”

name =“details”

name1 =“车辆统计数据”

name1 =“accelerometer”

我必须计算严格遵守上述条件的客户数量。例如,在下表中,客户“Joy”不应计算在内,因为他有 另外,即使名义上有“摘要”和“详细信息”,名称中也包含“车辆统计数据”和“加速计”,也会在名称中填写“费用”。

同样地,客户“Lan”不应该被计算,因为他在name1中另外做了“超速”,这不是在上述条件下。

    customername    name        name1
    Joy             summary     vehicle stats
    Joy             details     accelerometer
    Joy             expenses    speeding
    Lan             summary     vehicle stats
    Lan             details     accelerometer   
    Lan             details     speeding
    Hana            details     accelerometer
    Hana            summary     vehicle stats

下表的计数必须为1,因为只有1位客户(Hana)在名称和“车辆统计数据”中仅执行了“摘要”和“详细信息”, 名称中的“加速度计”。

这是我目前的查询:

    select name, name1, count(distinct(customername))
    from table1
    where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
    group by name, name1
    having name in ('summary', 'details') 
    or name1 in ('vehicle stats', 'accelerometer')

任何建议都会很棒!!

2 个答案:

答案 0 :(得分:0)

第1部分

select      customername

from        table1

group by    customername

having      count 
            (
                case 
                    when    name  in ('summary', 'details') 
                         or name1 in ('vehicle stats','accelerometer')
                    then    1
                end
            ) > 0

        and count 
            (
                case 
                    when    name  not in ('summary', 'details') 
                         or name1 not in ('vehicle stats','accelerometer')
                    then    1
                end
            ) = 0
+--------------+
| customername |
+--------------+
| Hana         |
+--------------+

第2部分

select      name
           ,name1
           ,count(*)

from       (select      sort_array(collect_set(name))   as name
                       ,sort_array(collect_set(name1))  as name1

            from        table1

            group by    customername

            having      count 
                        (
                            case 
                                when    name  in ('summary', 'details') 
                                     or name1 in ('vehicle stats','accelerometer')
                                then    1
                            end
                        ) > 0

                    and count 
                        (
                            case 
                                when    name  not in ('summary', 'details') 
                                     or name1 not in ('vehicle stats','accelerometer')
                                then    1
                            end
                        ) = 0
            ) t

group by    name
           ,name1
+-----------------------+-----------------------------------+----+
|         name          |               name1               | c2 |
+-----------------------+-----------------------------------+----+
| ["details","summary"] | ["accelerometer","vehicle stats"] |  1 |
+-----------------------+-----------------------------------+----+

答案 1 :(得分:0)

您还可以使用collect_set仅检查这些列中的指定条目。

select customername
from table1
where date_time between "2017-01-01 00:00:00" and "2017-01-10 00:00:00"
group by customername
having concat_ws(',',collect_set(name)) = 'summary,details'
and concat_ws(',',collect_set(name1)) = 'vehicle stats,accelerometer'

必须排序来自collect_set的连续输出  为了比较。