SQL中是否有一种方法可以根据字符串中的分隔符将字符串拆分为n列。我知道SPLIT_PART函数,其中有三个参数,字符串,分隔符和字符串中的第n个分隔符。例如:
select
split_part('2016-01-01 00:11:00|Sprout|0', '|', 1), split_part('2016-01-01 00:11:00|Sprout|0', '|', 2), split_part('2016-01-01 00:11:00|Sprout|0', '|', 3);
有没有办法在没有第三个参数的情况下执行此操作,您只需提供字符串和分隔符,但最终会出现多少列,分隔符出现在字符串中?
一旦Vertica允许基于Python的UDF,我知道这是一个使用.split()方法的简单修复,但目前有解决方案吗?我知道这可能是一个长镜头,但我主要是出于好奇,因为使用split_part完全符合我的目的。
这不可能是一个可以接受的答案
答案 0 :(得分:1)
确定。如果您很高兴获得字符串的第n个标记,请尝试:
SQL>SELECT
...> regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 1 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_first
...>, regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 2 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_second
...>, regexp_substr(
...> '2016-01-01 00:11:00|Sprout|0' -- source string
...> , '[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> , 1 -- starting from begin of string: position 1
...> , 3 -- the N-th occurrence
...> , '' -- no regexp modifier
...> , 1 -- we want the only remembered group - the 1st
...> ) the_third
...>;
the_first |the_second |the_third
2016-01-01 00:11:00 |Sprout |0
但是如果你想转动你的分隔字符串,以便每个标记形成一个新的行 - 两种可能性:
SQL>-- manual, using regexp_substr ...
...>with
...>the_array as (
...> select 1 as idx
...>union all select 2
...>union all select 3
...>union all select 4
...>union all select 5
...>union all select 6
...>union all select 7
...>union all select 8
...>union all select 9
...>union all select 10 -- increase if you might get a bigger array than one of 10 elements
...>)
...> ,concepts as (
...>select '2016-01-01 00:11:00|Sprout|0' as concepts_list
...>)
...>select * from (
...> select
...> idx
...> ,trim(
...> regexp_substr(
...> concepts_list -- source string
...> ,'[|]?([^|]+)' -- pattern (an optional bar, followed by many non-bars, which we remember as the 1st group)
...> ,1 -- starting from begin of string: position 1
...> ,idx -- the idx-th occurrence
...> ,'' -- no regexp modifier
...> ,1 -- we want the only remembered group - the 1st
...> )
...> ) as concept
...> from concepts
...> cross join the_array
...>) foo
...>where concept <> ''
...>;
idx |concept
1|2016-01-01 00:11:00
3|0
2|Sprout
select succeeded; 3 rows fetched
SQL>-- using the strings_package on:
...>-- https://github.com/vertica/Vertica-Extension-Packages/blob/master/strings_package/src/StringTokenizerDelim.cpp
...>WITH csvtab(id,delimstring) AS (
...> SELECT 1,'2016-01-01 00:11:00|Sprout|0'
...>UNION ALL SELECT 2,'2016-01-02 00:11:00|Trout|1'
...>UNION ALL SELECT 3,'2016-01-03 00:11:00|Salmon|2'
...>UNION ALL SELECT 4,'2016-01-04 00:11:00|Bass|3'
...>)
...>SELECT id, words
...>FROM (
...> SELECT id, v_txtindex.StringTokenizerDelim(delimstring,'|') OVER (PARTITION by id) FROM csvtab
...>) a
...>ORDER BY 1;
id |words
1|2016-01-01 00:11:00
1|Sprout
1|0
2|2016-01-02 00:11:00
2|Trout
2|1
3|2016-01-03 00:11:00
3|Salmon
3|2
4|2016-01-04 00:11:00
4|Bass
4|3
select succeeded; 12 rows fetched