Question

在 Google Bigquery 中有一个这样的表，其中包含 1 个 id 列（customers）和 3 个 store-name 列：

id  |PA|PB|Mall|
----|--|--|----|
3699|1 |1 | 1  |  
1017|  |1 | 1  | 
9991|1 |  |    |

我的目标是可以选择访问过的客户 (id)，例如：

只有PA
PA 和 PB
PA 和购物中心
PA、PB 和购物中心

另一种选择输出可能是：

id  |Store     |
----|--------- |
3699|PA+PB+Mall|
1017|PB+Mall   |
9991|PA        |

然而，无论访问过其他商店，这不会给我所有经过 PA 的次数。在上面的示例中，计数为 2（3699 和 9991）。

第二种选择输出可能是：

id  |Store|
----|-----|
3699|PA   |
3699|PB   |
3699|Mall |
1017|PB   |
1017|Mall |
9991|PA   |

但是，这不允许我（我认为）选择/过滤那些已经访问过的人，例如 BOTH PA 和 Mall（只有 3699）

第三种选择输出可以是组合：

id  |Store| Multiple store|
----|-----|---------------|
3699|PA   | PA+PB+Mall    |
3699|PB   | PA+PB+Mall    |
3699|Mall | PA+PB+Mall    |
1017|PB   | PB+Mall       |
1017|Mall | PB+Mall       |
9991|PA   |               |

什么选择是最好的，还有其他选择可以实现我的目标吗？我相信替代方案 3 可能是最好的，但不确定如何实现它。

Answer 1

这取决于你想要什么。例如，第三个就是：

select t.*,
       string_agg(store, '+') over (partition by id)
from t;

第二个是：

select id, string_agg(store, '+')
from t
group by id;

Answer 2

对于第三个选项，您可以尝试对当前表进行逆透视，然后应用 .pipe( debounceTime(150), distinctUntilChanged() ) 来获取包含每个 STRING_AGG 的所有存储的计算列：

id

Answer 3

考虑以下所有三个选项的方法

假设输入数据在问题样本中为空时填充为空

with `project.dataset.table` as (
  select 3699 id, 1 PA, 1 PB, 1 Mall union all
  select 1017, null, 1, 1 union all
  select 9991, 1, null, null
)

选项#1

select id, string_agg(key, '+') as Store 
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id' 
and value != 'null'
group by id

带输出

选项#2

select id, key as Store
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id' 
and value != 'null'

带输出

选项 #3

select id, key as Store, 
  string_agg(key, '+') over(partition by id) as Multiple_Store 
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) kv,
unnest([struct(split(kv,':')[offset(0)] as key, split(kv,':')[offset(1)] as value)])
where key !='id' 
and value != 'null'

带输出

将列转换为行值

3 个答案: