我的表格中有一个字符串列,如下所示:
accountNumber:123456
{"accountNumber":"123456"}
我需要一种仅从这些字符串中提取123456的动态方式。您能提出一个解决方案吗?
答案 0 :(得分:2)
使用the REGEXP_SUBSTR(…)
built-in function使用正则表达式模式提取子字符串。
如果每个列值中只有一个数字,则number pattern或numeric characters range语法就足够了:
SELECT
'accountNumber:123456' i1,
regexp_substr(i1, '[0-9]+') r1,
'{"accountNumber":"123456"}' i2,
regexp_substr(i2, '[0-9]+') r2;
+----------------------+--------+----------------------------+--------+
| I1 | R1 | I2 | R2 |
|----------------------+--------+----------------------------+--------|
| accountNumber:123456 | 123456 | {"accountNumber":"123456"} | 123456 |
+----------------------+--------+----------------------------+--------+
如果数字恰好是6位数字,请使用{n}
repetition syntax:
select
'accountNumber:123456,anotherNumber:123' i1,
regexp_substr(i1, '[0-9]{6}') r1,
'{"accountNumber":"123456", "anotherNumber": 123}' i2,
regexp_substr(i2,'[0-9]{6}') r2;
+----------------------------------------+--------+--------------------------------------------------+--------+
| I1 | R1 | I2 | R2 |
|----------------------------------------+--------+--------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123 | 123456 | {"accountNumber":"123456", "anotherNumber": 123} | 123456 |
+----------------------------------------+--------+--------------------------------------------------+--------+
如果数字只能在文本accountNumber
之后,则可以引入(capture groups):
select
'accountNumber:123456,anotherNumber:123,somethingElse:456789' i1,
regexp_substr(i1, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r1,
'{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}' i2,
regexp_substr(i2, 'accountNumber[:" ]+([0-9]{6})', 1, 1, 'e', 1) r2;
+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
| I1 | R1 | I2 | R2 |
|-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------|
| accountNumber:123456,anotherNumber:123,somethingElse:456789 | 123456 | {"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789} | 123456 |
+-------------------------------------------------------------+--------+---------------------------------------------------------------------------+--------+
构建完全正确的正则表达式将需要更多有关数据中所有可能方差的知识。尝试在Regex101,RegExr等网站上以良好的测试集交互地构建模式,以使其更易于开发。
注意:如果整个数据实际上都是JSON格式,则Snowflake允许parsing them into a VARIANT
data type至query them more naturally:
select
parse_json('{"accountNumber":"123456", "anotherNumber": 123, "somethingElse": 456789}'):accountNumber::integer account_number;
+----------------+
| ACCOUNT_NUMBER |
|----------------|
| 123456 |
+----------------+