我在Hive表下面
accountNum date status action qty time
---------- ---- ------ ------ --- ----
1234 2017 filled B 10 11:20
1234 2017 filled S 10 11:20
2345 2017 filled B 20 12:00
2345 2017 filled B 10 12:00
4444 2017 filled B 5 01:00
4444 2017 filled S 5 02:00
这里我想比较两行动作“B”然后动作“S”。如果在第一个B找到2行,然后在那些记录上找到S,我必须检查accountNum,日期,时间,状态是否相同。
因此,基于上述测试数据,我应该只得到前两行
accountNum date status action qty time
---------- ---- ------ ------ --- ----
1234 2017 filled B 10 11:20
1234 2017 filled S 10 11:20
对于这个我应该写什么类型的查询?
我有以下的mysql查询,但HIVE不支持HAVING / DISTINCT / COUNT所以它在HIVE中不起作用..无论如何使用HAVING或任何方式使用JOIN和写查询?
select t1.*
from yourTable t1
join (
select accountNum, date, status, time
from yourTable
where action in ('B', 'S')
group by accountNum, date, status, time
having count(distinct action) = 2
) t2
on t1.accountNum = t2.accountNum and
t1.date = t2.date and
t1.status = t2.status and
t1.time = t2.time
答案 0 :(得分:0)
HAVING
是保留字。
如果SELECT
条款中的表达式没有出现在select t1.*
from yourTable t1
join (
select accountNum, date, status, time,count(distinct action)
from yourTable
where action in ('B', 'S')
group by accountNum, `date`, status, time
having count(distinct action) = 2
) t2
on t1.accountNum = t2.accountNum and
t1.`date` = t2.`date` and
t1.status = t2.status and
t1.time = t2.time
子句中,则表达式似乎存在限制。
此查询(基于您的原始查询)有效:
+------------+------+--------+--------+-----+-------+
| accountnum | date | status | action | qty | time |
+------------+------+--------+--------+-----+-------+
| 1234 | 2017 | filled | B | 10 | 11:20 |
| 1234 | 2017 | filled | S | 10 | 11:20 |
+------------+------+--------+--------+-----+-------+
select accountnum,`date`,status,action,qty,time
from (select *
,max(case when action = 'B' then 1 end) over w as b_flag
,max(case when action = 'S' then 1 end) over w as s_flag
from yourTable
where action in ('B', 'S')
window w as (partition by accountNum, `date`, status, time)
) t
where b_flag = 1
and s_flag = 1
;
这是另一种基于Windows功能的解决方案
+------------+------+--------+--------+-----+-------+
| accountnum | date | status | action | qty | time |
+------------+------+--------+--------+-----+-------+
| 1234 | 2017 | filled | B | 10 | 11:20 |
| 1234 | 2017 | filled | S | 10 | 11:20 |
+------------+------+--------+--------+-----+-------+
import numpy as np
import theano
from scipy.interpolate import interp1d
import pymc3 as pm3
theano.config.compute_test_value = 'ignore'
theano.config.on_unused_input = 'ignore'
class cprofile:
observations = np.array([6.25,2.75,1.25,1.25,1.5,1.75,1.5,1])
x = np.arange(0,18,0.5)
observed_x = np.array([0.3,1.4,3.1,5,6.8,9,13.4,17.1])
def doMAP(self):
model = pm3.Model()
with model:
t = pm3.Uniform("t",0,5)
y = pm3.Uniform("y",0,5)
z = pm3.Uniform("z",0,5)
obs=pm3.Normal('obs',
mu=FunctionIWantToFit(self)(t,y,z),
sd=0.1,observed=self.observations)
start = pm3.find_MAP()
print('start: ',start)
class FunctionIWantToFit(theano.gof.Op):
itypes=[theano.tensor.dscalar,
theano.tensor.dscalar,
theano.tensor.dscalar]
otypes=[theano.tensor.dvector]
def __init__(self, cp):
self.cp = cp # note cp is an instance of the 'cprofile' class
def perform(self,node, inputs, outputs):
t, y, z = inputs[0], inputs[1], inputs[2]
xxx = self.cp.x
temp = t+y*xxx+z*xxx**2
interpolated_concentration = interp1d(xxx,temp)
outputs[0][0] = interpolated_concentration(self.cp.observed_x)
testcp=cprofile()
testcp.doMAP()