我正在使用BigQuery,并且试图解析以逗号分隔的字符串以在其中查找特定数字。
下表示例
|--------|-----------------------------------------------------------|
| userID | sequence |
|--------|-----------------------------------------------------------|
| 123abc | 1,2,3,4,5,6,7,8 |
|--------|-----------------------------------------------------------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11 |
|--------|-----------------------------------------------------------|
| 789def | 1,2,3,4 |
|--------|-----------------------------------------------------------|
我需要创建一个CASE语句,其中根据以下逻辑对字符串“序列”的每个值进行评估,并将结果输出到其自己的列中。
SELECT userID
,sequence
,CASE WHEN sequence CONTAINS '1' THEN 1 ELSE 0 END AS action1
,CASE WHEN sequence CONTAINS '2' THEN 1 ELSE 0 END AS action2
,CASE WHEN sequence CONTAINS '3' THEN 1 ELSE 0 END as action3
....
,CASE WHEN sequence CONTAINS '9' AND '11' THEN 1 ELSE 0 END as action10
这将产生以下输出。
|--------|-------------------------|-------|-------|-------|---------|
| userID | sequence |action1|action2|action3|action10 |
|--------|-------------------------|-------|-------|-------|---------|
| 123abc | 1,2,3,4,5,6,7,8 | 1 | 1 | 1 | 0 |
|--------|-------------------------|-------|-------|-------|---------|
| 456bcd | 1,2,3,4,5,6,7,8,9,10,11 | 1 | 1 | 1 | 1 |
|--------|-------------------------|-------|-------|-------|---------|
| 789def | 1,2 | 1 | 1 | 0 | 0 |
|--------|-------------------------|-------|-------|-------|---------|
请不要使用最后一个CASE WHEN语句非常重要,因为我需要将字符串值的这种非常具体的组合作为其自己的唯一操作来考虑。
我相信这可以在SQL Server中使用类似的方法实现:
CASE WHEN CHARINDEX('1', 'sequence')>0 THEN 1 ELSE 0 END as action1
,CASE WHEN CHARINDEX('2', 'sequence')>0 THEN 1 ELSE 0 END as action2
,CASE WHEN CHARINDEX('3', 'sequence')>0 THEN 1 ELSE 0 END as action3
...
,CASE WHEN CHARINDEX('9', 'sequence')>0 AND CHARINDEX('11', 'sequence')>0 THEN 1 ELSE 0 END as action10
但是,我在BigQuery中找不到可以达到相同结果的等效函数,而我对REGEX的尝试也不够。
我非常感谢这里的一些指导。预先感谢。
答案 0 :(得分:1)
请参阅下面的说明(适用于BigQuery标准SQL)
#standardSQL
WITH `project.dataset.table` AS (
SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
SELECT '789def', '1,2'
)
SELECT userID,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '1' ) > 0, 1, 0) action1,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '2' ) > 0, 1, 0) action2,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '3' ) > 0, 1, 0) action3,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '4' ) > 0, 1, 0) action4,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '5' ) > 0, 1, 0) action5,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '6' ) > 0, 1, 0) action6,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '7' ) > 0, 1, 0) action7,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '8' ) > 0, 1, 0) action8,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value = '9' ) > 0, 1, 0) action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`
-- ORDER BY userID
这将为您提供以下
Row userID action1 action2 action3 action4 action5 action6 action7 action8 action9 action10
1 123abc 1 1 1 1 1 1 1 1 0 0
2 456bcd 1 1 1 1 1 1 1 1 1 1
3 789def 1 1 0 0 0 0 0 0 0 0
它得到了简化-但根据您的要求提供了some guidance
:o)
请参见下面的重构思路(通常是无休止的过程),因此至少不那么冗长
#standardSQL
WITH `project.dataset.table` AS (
SELECT '123abc' userID, '1,2,3,4,5,6,7,8' sequence UNION ALL
SELECT '456bcd', '1,2,3,4,5,6,7,8,9,10,11' UNION ALL
SELECT '789def', '1,2'
)
SELECT userID,
IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`
更新以解决您对UNION ALL的评论
以上只是使用您问题中的虚拟数据,以便您可以测试,玩游戏-同时,解决方案实际上是
#standardSQL
SELECT userID,
IF('1' IN UNNEST(SPLIT(sequence)), 1, 0) AS action1,
IF('2' IN UNNEST(SPLIT(sequence)), 1, 0) AS action2,
IF('3' IN UNNEST(SPLIT(sequence)), 1, 0) AS action3,
IF('4' IN UNNEST(SPLIT(sequence)), 1, 0) AS action4,
IF('5' IN UNNEST(SPLIT(sequence)), 1, 0) AS action5,
IF('6' IN UNNEST(SPLIT(sequence)), 1, 0) AS action6,
IF('7' IN UNNEST(SPLIT(sequence)), 1, 0) AS action7,
IF('8' IN UNNEST(SPLIT(sequence)), 1, 0) AS action8,
IF('9' IN UNNEST(SPLIT(sequence)), 1, 0) AS action9,
IF((SELECT COUNT(1) FROM UNNEST(SPLIT(sequence)) value WHERE value IN ('10', '11' )) > 0, 1, 0) action10
FROM `project.dataset.table`