我正在尝试(不成功)将Google BigQuery中的字符串列拆分为包含所有单个单词和所有单词对的行(彼此相邻且按顺序排列)。我还需要维护IndataTable中单词的ID字段。两个记录集都有2列。
IndataTable作为IDT
ID WordString
1个苹果香蕉梨
2胡萝卜
3蓝红绿黄
OutdataTable为ODT
ID WordString
1个苹果
1个香蕉
1梨
1个苹果香蕉
1香蕉梨
2胡萝卜
3蓝色
3红色
3绿色
3黄色
3蓝红色
3红绿
3绿黄色(只有彼此相邻的对)
这在BigQuery SQL中是否可行?
编辑/加了:
这就是我到目前为止用于将其分成单个单词的方法。我真的很想弄清楚如何将其扩展为单词对。我不知道是否可以修改它,或者我需要一个新的方法。
SELECT ID, split(WordString,' ') as Words
FROM (
select *
from
(select ID, WordString from IndataTable)
)
答案 0 :(得分:1)
以下是BigQuery Standard SQL
scala> for(i <- List("a" ,"b" )){
| names = i :: names }
scala> names
res11: List[String] = List(b, a)
结果符合预期:
#standardSQL
WITH IndataTable AS (
SELECT 1 id, 'apple banana pear' WordString UNION ALL
SELECT 2, 'carrot' UNION ALL
SELECT 3, 'blue red green yellow'
), words AS (
SELECT id, word, pos
FROM IndataTable, UNNEST(SPLIT(WordString,' ')) AS Word WITH OFFSET pos
), pairs AS (
SELECT id, CONCAT(word, ' ', LEAD(word) OVER(PARTITION BY id ORDER BY pos)) pair
FROM words
)
SELECT id, word AS WordString FROM words UNION ALL
SELECT id, pair AS WordString FROM pairs
WHERE NOT pair IS NULL
ORDER BY id