我在hive中有一个表名test,由pyspark获取。
测试表
srccountry|dstcountry | date | time
/na | /na | 2019-06-24 | 01:00:00
reserved | reserved |2019-06-24 | 01:00:00
india | us |2019-06-24 | 01:30:00
us | india |2019-06-24 | 01:35:00
india | /na |2019-06-24 | 01:40:00
india | reserved |2019-06-24 | 01:45:00
/na | us |2019-06-24 | 01:50:00
reserved | us |2019-06-24 | 01:59:00
我想要这样的输出
srccountry|dstcountry | date | time | count
india | us |2019-06-24 | 01:30:00 | 1
us | india |2019-06-24 | 01:35:00 | 1
我写了一个类似thie的查询
select srccountry,dstcountry,count(*) as count
from text
where date='2019-06-24'
and time between '01:00:00' and '02:00:00'
and ((srccountry!='reserved' and dstcountry!='reserved')
or (srccountry!='/na' and dstcountry!='/na'))
group by srccountry,dstcountry order by count
但是它返回所有数据。