我尝试了这个Hive查询
Select id,count(distinct CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),60) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
,count(CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),60) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
From DB.TABLE2 GROUP BY id limit 10;
它给了我像smthg:
111007001007633 1 1
111007001029793 1 1
111007001000521 1 11
111007001000794 1 1
111007001000273 3 13
111007001001032 1 1
111007001025874 1 4
111007001001792 1 7
111007001029181 1 1
111007001000141 16 96
但是当我添加其他计数时:
Select id,count(distinct CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),60) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
,count(CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),60) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
,count(distinct CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),15) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
,count(CASE WHEN unix_timestamp(m_date) BETWEEN unix_timestamp(cast(date_sub(cast('2017-02-01' as date),15) as date)) AND unix_timestamp(cast('2017-02-01' as date)) THEN m_date ELSE 0 END)
From DB.TABLE2 GROUP BY id limit 10;
它返回的内容如下:
111007001010439 0 0 1 0
111007001026963 0 0 1 0
111007001028001 0 0 1 0
111007001032987 0 0 1 0
111007001048710 0 0 1 0
111007001052415 0 0 1 0
111007002008374 0 0 1 0
111007003000644 0 0 1 0
111007003002210 0 0 1 0
我在hadoop集群上工作,如果它是由map reduce引起的,我就不会这样做。
由于
[编辑]
当我回答@pashaz评论时,第一个问题是来自两个相同查询(有和没有不同)的结果,其中1表示不同,0表示非不同。
第二个问题是两个不同查询和两个非不同查询之间的结果。如果您检查时间戳,您将看到第一个查询包含秒数,因为两个第一次计算“2017-02-01”和 60天之间的出现次数,“2017-”之间的次数计数出现次数02-01“和 15天之前。
[UPDATE]
如果我把WHERE子句放在其中
WHERE id="111007001007633" OR id="271011604404359" OR id="122213250512607" OR id="111007001033217"
111007001033217 0 0 0 0 0 0
122213250512607 1 3 8 14 0 0
271011604404359 12 21 26 42 5 9
111007001007633 14 19 24 34 5 5
LIMIT条款似乎是问题所在。
答案 0 :(得分:1)
提供的结果没什么不好的。在两个查询中出现“限制10”。没有保证会返回相同的身份证。
在第一个查询结果中显示“111007001007633”,在第二个查询中不存在。