我有一个在蜂巢中看起来像这样的桌子。我想做的是运行一个查询,这样每隔3个小时,我就会查看唯一的workerUUID并对其进行一些处理。所以我想做的是从现在到
之前的3小时之间Select * from these workerUUIDs
我正在使用hive来运行此查询,并且该表每三到六小时就有几百万个条目。编写此查询的最佳方法是什么?
--------------------------------------------
| workerUUID | City | Debt | TestN| LName|
|------------------------------------------|
| 1234 | SF | 100k | 23 | Nil |
|-------------------------------------------
| 6789 | NY | 150k | 34 | Fa |
|------------------------------------------|
| 1234 | SF | 10k | 45 | Na |
--------------------------------------------
| 6789 | NY | 1k | 13 | Nil |
|-------------------------------------------
| 6789 | SF | 150k | 34 | Nil |
|------------------------------------------|
| 8999 | IN | 10k | 45 | Na |
--------------------------------------------
我基本上想做
select City, Debt, TestN where workerUUID = '1234'
select City, Debt, TestN where workerUUID = '6789'
select City, Debt, TestN where workerUUID = '8999'
为进一步说明,我想生成诸如
的临时表
| workerUUID | City | Debt | TestN|
|------------------------------------
| 1234 | SF | 100k | 23 |
|------------------------------------
| 1234 | SF | 10k | 45 |
|-----------------------------------|
| workerUUID | City | Debt | TestN|
|------------------------------------
| 6789 | NY | 150k | 23 |
|------------------------------------
| 6789 | NY | 1k | 13 |
|------------------------------------
| 6789 | NY | 150k | 34 |
|-----------------------------------
| workerUUID | City | Debt | TestN|
|------------------------------------
| 8999 | IN | 10k | 45 |
等
对于在3小时间隔内生成的workerUUID的所有唯一值