蜂巢查询中的倍数或条件

时间:2019-07-04 10:54:54

标签: hive pyspark pyspark-sql

我在hive中有一个表名test,由pyspark获取。

测试表

srccountry|dstcountry | date       | time 
  /na     |  /na      |  2019-06-24 | 01:00:00
 reserved |  reserved |2019-06-24 | 01:00:00
  india   |   us      |2019-06-24 | 01:30:00
  us      | india     |2019-06-24 | 01:35:00
  india   | /na       |2019-06-24 | 01:40:00
 india    | reserved  |2019-06-24 | 01:45:00
 /na      | us        |2019-06-24 | 01:50:00
 reserved |  us       |2019-06-24 | 01:59:00

我想要这样的输出

   srccountry|dstcountry | date | time        | count
    india   |   us      |2019-06-24 | 01:30:00 | 1
    us      | india     |2019-06-24 | 01:35:00 | 1

我写了一个类似thie的查询

select srccountry,dstcountry,count(*) as count 
  from text 
 where date='2019-06-24' 
   and time between '01:00:00' and '02:00:00' 
   and ((srccountry!='reserved' and dstcountry!='reserved') 
       or (srccountry!='/na' and dstcountry!='/na')) 
 group by srccountry,dstcountry order by count 

但是它返回所有数据。

0 个答案:

没有答案