提取由分隔符分隔的字符串部分

时间:2021-01-22 19:18:57

标签: sql sql-server

这是对我原来问题的补充:Variable length substring between two characters

数据通常是这样的,都在一列中:

Growth: Compliance;Priority: Contractual;Original Vendor: ABC SERVICES;

在上面的例子中:

  • 需要提取“合规性”(而非“增长:合规性”)并将其存储在 GROWTH_TXT 列中
  • 需要提取“Contractual”并存储在PRIORITY_TXT列中
  • “原始供应商:ABC 服务”可以忽略,因为它没有存储在任何地方

4 个答案:

答案 0 :(得分:2)

SQL Server 2016+

使用 STRING_SPLIT()、PARSENAME()、PIVOT 的概念

-- Mimic Table named z_tbl_tmp
DECLARE @z_tbl_tmp TABLE (id INT, OPTIONAL_FIELD_1 NVARCHAR(max));
INSERT INTO @z_tbl_tmp VALUES (1, N'Growth: Compliance;Priority: Contractual;Original Vendor: ABC SERVICES;');
INSERT INTO @z_tbl_tmp VALUES (2, N'Growth: Run; Priority: Critical - Turns Contractual');
-- 

-- Pivot Parsed Data
WITH tbl_parsed AS (
    -- Parse Data into Key Value Pairs
    SELECT id, 
        TRIM(PARSENAME(REPLACE(value,': ','.'), 2)) AS K, 
        TRIM(PARSENAME(REPLACE(value,': ','.'), 1)) AS V 
    FROM @z_tbl_tmp
        CROSS APPLY STRING_SPLIT(OPTIONAL_FIELD_1,';')
)
SELECT id, [Growth] AS GROWTH_TXT, [Priority] AS PRIORITY_TXT
FROM tbl_parsed
    PIVOT (MAX(V) FOR [K] IN ([Growth], [Priority])) AS pvt
+----+------------+-------------------------------+
| id | GROWTH_TXT | PRIORITY_TXT                  |
+----+------------+-------------------------------+
|  1 | Compliance | Contractual                   |
+----+------------+-------------------------------+
|  2 | Run        | Critical - Turns Contractual  |
+----+------------+-------------------------------+

答案 1 :(得分:1)

从 SQL Server 2016 开始,STRING_SPLIT()PATINDEX() 和条件聚合的组合是一个选项:

DECLARE @text varchar(1000) = 'Growth: Compliance;Priority: Contractual;Original Vendor: ABC SERVICES;'

SELECT 
   MAX(CASE WHEN PATINDEX('Growth:%', [value]) = 1 THEN STUFF([value], 1, LEN('Growth:'), '') END) AS GROWTH_TXT,
   MAX(CASE WHEN PATINDEX('Priority:%', [value]) = 1 THEN STUFF([value], 1, LEN('Priority:'), '') END) AS PRIORITY_TXT
FROM STRING_SPLIT(@text, ';')

结果:

GROWTH_TXT  PRIORITY_TXT
 Compliance  Contractual

如果数据存储在表中,则需要额外的 APPLY 运算符:

DECLARE @text varchar(1000) = 'Growth: Compliance;Priority: Contractual;Original Vendor: ABC SERVICES;'
SELECT @text AS OPTIONAL_FIELD_1
INTO z_tbl_temp

SELECT a.*
FROM z_tbl_temp z
OUTER APPLY ( 
   SELECT 
      MAX(CASE WHEN PATINDEX('Growth:%', [value]) = 1 THEN STUFF([value], 1, LEN('Growth:'), '') END) AS GROWTH_TXT,
      MAX(CASE WHEN PATINDEX('Priority:%', [value]) = 1 THEN STUFF([value], 1, LEN('Priority:'), '') END) AS PRIORITY_TXT
   FROM STRING_SPLIT(z.OPTIONAL_FIELD_1, ';')
) a

答案 2 :(得分:0)

另一种基于 JSON 的方法。

SQL Server 2016/2017 及更高版本。

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, val NVARCHAR(255));
INSERT INTO @tbl VALUES
(N'Growth: Compliance;Priority: Contractual;Original Vendor: ABC SERVICES;')
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ';'
    , @separatorJson CHAR(3) = '","'
    , @colon CHAR(2) = ': '
    , @colonJson CHAR(3) = '":"';

;WITH rs AS
(
    SELECT id
          , N'{"' + 
             REPLACE(REPLACE(TRIM(';' FROM val),@colon, @colonJson), @separator, @separatorJson) + 
                N'"}' AS DataJson
    FROM @tbl
)
SELECT id
    , JSON_VALUE(DataJson, N'$.Growth') AS [Growth]
    , JSON_VALUE(DataJson, N'$.Priority') AS [Priority]
    , JSON_VALUE(DataJson, N'$."Original Vendor"') AS [Original Vendor]
FROM rs;

输出

+----+------------+-------------+-----------------+
| id |   Growth   |  Priority   | Original Vendor |
+----+------------+-------------+-----------------+
|  1 | Compliance | Contractual | ABC SERVICES    |
+----+------------+-------------+-----------------+

对于 SQL Server 2016,因为它不支持更新的 TRIM() 函数:

REPLACE(REPLACE(LEFT(val,LEN(val)-1),@colon, @colonJson), @separator, @separatorJson) + 

答案 3 :(得分:0)

如果字符串列遵循一致的模式,那么应该按照这些行来做。

with cte(str,i1,i2,i3) as

(select str, 
        charindex('Growth: ',str), 
        charindex('Priority: ',str), 
        charindex('Original Vendor: ',str)
 from your_table) 

 select substring(str,i1+8,i2-i1-9), substring(str,i2+9,i3-i2-10) 
 from cte; 

substring 中 8、9 和 10 的算术是为了去除我认为大小固定的不需要的字符。


另一种方法使用 string_splitcross apply

select str, 
       replace(min(value),'Growth: ','') growth_txt,
       replace(max(value),'Priority: ','') as priority_txt
from your_table
cross apply string_split(str,';')
where value like 'Growth: %' or value like 'Priority: %'
group by str;

由于字母 G 出现在字母 P 之前,minmax 的使用确保 replace 函数用于正确的字符串。如果您决定解析一组不同的元素并且不想过多考虑字母顺序,@Zhorov 的答案处理方式可能更可靠。