ClickHouse仅按属性计算分位数

时间:2019-08-28 14:16:23

标签: clickhouse

考虑此数据集:

<select v-model="post_categories" multiple>
    <option v-for="post_category in post_categories" :value="post_category.id">
       {{ post_category.title }}
    </option>
</select>

以下查询:

attr | value
------------
A    | 1
A    | 2
A    | 3
B    | 4
B    | 5
B    | 6
C    | 7
C    | 8
C    | 9

每个attr都会给我50%的分位数

SELECT 
    attr, 
    quantile(0.5)(value) AS quantile
FROM mytable
GROUP BY attr

但是我想得到每个attr都没有attr值的50%分位数。所以我需要返回的查询

attr | quantile
---------------
A    | 2
B    | 5
C    | 8

因此,对于attr | quantile --------------- A | 7 B | 4 C | 7 ,它会根据除A之外的所有值来计算分位数。

第一个查询将返回

1 个答案:

答案 0 :(得分:0)

看起来您在分位数中打错了-应该是 0.6 而不是 0.5 ,然后此查询返回了所需的结果:(A = 7,B = 4,C = 7)

SELECT groupArray(v) values_per_attr,
  arrayEnumerate(values_per_attr) indexes,
  arrayMap(groupIndex -> (values_per_attr[groupIndex].1, arrayReduce('groupArrayArray', arrayFilter((v, i) -> i != groupIndex, arrayMap(v -> v.2, values_per_attr), indexes))), indexes) exclusive_values_per_attr,
  arrayMap(v -> (v.1, arrayReduce('quantile(0.5)', v.2)), exclusive_values_per_attr) result
FROM
(
    SELECT (attr, groupArray(value)) AS v
    FROM
    (
        /* Emulate test dataset.
        attr | value
        ------------
        A    | 1
        A    | 2
        A    | 3
        B    | 4
        B    | 5
        B    | 6
        C    | 7
        C    | 8
        C    | 9        
        */
        SELECT
            if((number / 4) < 1, 'A', if((number / 7) < 1, 'B', 'C')) AS attr,
            number AS value
        FROM numbers(1, 9)
    )
    GROUP BY attr
)
FORMAT Vertical;
/*
Result: 

Row 1:
──────
values_per_attr:           [('B',[4,5,6]),('C',[7,8,9]),('A',[1,2,3])]
indexes:                   [1,2,3]
exclusive_values_per_attr: [('B',[7,8,9,1,2,3]),('C',[4,5,6,1,2,3]),('A',[4,5,6,7,8,9])]
result:                    [('B',5),('C',3.5),('A',6.5)]
*/