Clickhouse SQL:将数据从长格式重整为宽格式

时间:2019-09-26 13:32:36

标签: clickhouse

我正在使用Clickhouse SQL方言。数组分解后,我得到以下格式的数据。

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |     query      |     Palmera      |
|------|---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |   found_items  |       10         |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |   found_items  |        0         |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |     query      |     harmon       |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |   found_items  |       17         |
|------|---------------------|----------------|------------------|

我通过查询收到了这样的结果

SELECT id, timestamp, 
properties.key AS property_key, 
properties.value as property_value
FROM (
SELECT 
  rowNumberInAllBlocks() as id,
  timestamp,
  properties.key,
  properties.value
FROM database.table
WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') 
AND toDateTime('2019-09-26 11:26:56')
ORDER BY timestamp)
ARRAY JOIN properties
WHERE
properties.key IN ('query', 'found_items')

我需要提取found_items等于0的查询。我不知道如何将数据从长格式重整为宽格式。因此,预期结果如下。

|----- |---------------------|-----------------|---------------|
|  id  |      timestamp      |     query       |  found_items  |
|----- |---------------------|-----------------|---------------|
|  02  | 2019-09-25 13:11:09 |     pigeo       |       0       |
|------|---------------------|-----------------|---------------|
|  15  | 2019-09-25 16:08:13 |     coche       |       0       |
|------|---------------------|-----------------|---------------|
|  27  | 2019-09-16 13:19:46 | panitos pampers |       0       |
|------|---------------------|-----------------|---------------|

OR

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  15  | 2019-09-25 16:08:13 |     query      |     coche        |
|------|---------------------|----------------|------------------|
|  27  | 2019-09-16 13:19:46 |     query      |  panitos pampers |
|------|---------------------|----------------|------------------|

1 个答案:

答案 0 :(得分:2)

尝试此查询:

SELECT 
  id, 
  groupArray(timestamp)[1] timestamp,
  groupArray(properties.key)[1] property_key,
  groupArray(properties.value) property_value  
FROM (
  SELECT 
    rowNumberInAllBlocks() as id,
    timestamp,
    properties.key,
    properties.value
  FROM test.test_011
  WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') AND toDateTime('2019-09-26 11:26:56') 
    AND properties.value[indexOf(properties.key, 'found_items')] = '0'
  ORDER BY timestamp)
ARRAY JOIN properties
WHERE properties.key IN ('query' /*, ..*/)
GROUP BY id, properties.key
ORDER BY id

/* Result
┌─id─┬───────────timestamp─┬─property_key─┬─property_value────────┐
│  0 │ 2019-09-25 13:11:09 │ query        │ ['pigeo']             │
│  1 │ 2019-09-16 13:19:46 │ query        │ ['panitos','pampers'] │
└────┴─────────────────────┴──────────────┴───────────────────────┘
*/

/* prepare test data */

CREATE TABLE test.test_011 (
  timestamp DateTime,
  properties Nested(key String, value String)
) ENGINE = Memory;

INSERT INTO test.test_011
VALUES 
  (toDateTime('2019-09-25 16:24:38'),  ['query', 'found_items'], ['Palmera', '10']),
  (toDateTime('2019-09-25 13:11:09'),  ['query', 'found_items'], ['pigeo', '0']),
  (toDateTime('2019-09-25 16:08:13'),  ['query', 'found_items'], ['harmon', '17']),
  (toDateTime('2019-09-16 13:19:46'), ['found_items', 'query', 'query'], ['0', 'panitos', 'pampers']),
  (toDateTime('2019-09-25 16:22:38'),  ['query', 'query'], ['test', 'test']);