基于一列中的字符串将一行转换为多行多列

时间:2021-03-01 16:02:58

标签: sql sql-server

我必须根据一列字符串中的数值将一行转换为多行

示例输入:

EmpId | work date  | String 
------+------------+--------------------------------------------------------
 1234 | 12/10/2020 | The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters

示例输出:

Empid | Work Date  | Clock | Radius 
------+------------+-------+--------
1234  | 12/10/2020 | 12:03 | 209759
1234  | 12/10/2020 | 12:04 | 209758

根据字符串中必须拆分为两列和行的数字,可以有 n 个值。

请帮我解决这个问题 - 谢谢

2 个答案:

答案 0 :(得分:0)

请尝试以下解决方案。

它很乱,但工作正常:

  1. 第一个 CTE 是通过 XML 和 XQuery 对 free_text 列进行标记,并过滤掉没有标记的标记 一个逗号。
  2. 第二个 CTE 正在从 XML 中获取 clock 列。
  3. 第三个 CTE 正在从 XML 中获取半径列。
  4. 最后的 SELECT 将这一切结合在一起。

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (emp_id INT, work_date DATE, free_text NVARCHAR(MAX))
INSERT INTO @tbl (emp_id, work_date, free_text) VALUES
(1234, '12/10/2020',N'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1)
    , @comma CHAR(1) = ',';

WITH rs AS
(
    SELECT emp_id, work_date
        , CAST('<root><r><![CDATA[' + 
            REPLACE(free_text COLLATE Czech_BIN2, @separator, ']]></r><r><![CDATA[') + ']]></r></root>' AS XML)
        .query('
        for $x in /root/r
        where contains($x, sql:variable("@comma"))
        return $x
        ') AS result
    FROM @tbl
), clock AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS seq
        , rs.*
        , z.value AS clock
    FROM rs
        CROSS APPLY result.nodes('/r[1]') AS t(c)
        CROSS APPLY STRING_SPLIT(c.value('.', 'VARCHAR(20)'), @comma) AS z
), Radius AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS seq
        , rs.*
        , z.value AS Radius
    FROM rs
        CROSS APPLY result.nodes('/r[2]') AS t(c)
        CROSS APPLY STRING_SPLIT(c.value('.', 'VARCHAR(20)'), @comma) AS z
)
SELECT c.emp_id, c.work_date, c.clock, r.Radius
FROM clock AS c
    INNER JOIN Radius AS r ON r.seq = c.seq
        AND r.emp_id = c.emp_id;

输出

+--------+------------+-------+--------+
| emp_id | work_date  | clock | Radius |
+--------+------------+-------+--------+
|   1234 | 2020-12-10 | 12:03 | 209759 |
|   1234 | 2020-12-10 | 12:04 | 209758 |
+--------+------------+-------+--------+

答案 1 :(得分:0)

另一个凌乱的 - 没有 XML - 但使用借用的“Tally”表解决方案 - 如果源数据量很高,这可能对性能很有用

感谢 Yitzhak Khabinsky,因为他的版本绝对有效,而且可能更易于维护 - 我的 XML 版本迷路了,所以去了一个计数表。请注意 Yitzhak 的解决方案,如果提供的半径读数比提供的时钟读数多 - 我们不会直接从结果中得知 - 整个记录被排除。

这是我的(也许是冗长的)解决方案

步骤 1) 建立一个 Tally 表

步骤 2) 构建一个表格函数,该函数将分隔值绑定并转换为具有 rownum 的表格

第 3 步 - 部署一个表格函数,相应地解析字符串并返回包含转换记录的表格

第 4 步测试用例 - 和示例用法

第五步 - 修改主函数,隐藏“半径”列右侧的所有列

步骤 1) 构建 Tally 表 来自https://www.sqlservercentral.com/articles/the-numbers-or-tally-table-what-it-is-and-how-it-replaces-a-loop-1

--=============================================================================
--      Setup Tally Table
--=============================================================================
    USE TempDB     --DB that everyone has where we can cause no harm
    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed
DECLARE @StartTime DATETIME    --Timer to measure total duration
    SET @StartTime = GETDATE() --Start the timer
--=============================================================================
--      Create and populate a Tally table
--=============================================================================
--===== Conditionally drop and create the table/Primary Key
     IF OBJECT_ID('dbo.Tally') IS NOT NULL 
        DROP TABLE dbo.Tally
 CREATE TABLE dbo.Tally 
        (N INT, 
         CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N))
--===== Create and preset a loop counter
DECLARE @Counter INT
    SET @Counter = 1
--===== Populate the table using the loop and couner
  WHILE @Counter <= 11000
  BEGIN
         INSERT INTO dbo.Tally
                (N)
         VALUES (@Counter)
            SET @Counter = @Counter + 1
    END
--===== Display the total duration
 SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'

步骤 2) 构建一个表格函数,该函数将分隔值绑定并转换为具有 rownum 的表格 这将确保即使有数百个半径和时钟读数 - 每个都将快速转换为子表。

https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-%e2%80%9ccsv-splitter%e2%80%9d-function

函数使用在步骤 1 中创建的计数表

create function [dbo].[SplitAndSequence] (@pString nvarchar(4000), @pDelimiter char(1)) returns table return
 SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY t.N),
        Item       = SUBSTRING(@pString, t.N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, t.N) - t.N)
   FROM TempDB.dbo.Tally t
  WHERE t.N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters
    AND SUBSTRING(@pDelimiter + @pString, t.N, 1) = @pDelimiter

第 3 步 - 部署一个表格函数,相应地解析字符串并返回包含转换记录的表格

CREATE function [dbo].[fn_ParseAndReturnObservations] (@Emp_ID varchar(50), @WorkDateStr varchar(10), @String nvarchar(300)) returns table 
-- select * from [fn_ParseAndReturnObservations]('1234', '12/10/2020', 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters')
return
with [cte_SourceData] as (
select [Emp_ID] = @Emp_Id -- '1234'
     , [Work Date] = convert(date,@WorkDateStr,103) -- convert(date,'12/10/2020',103)
     , [String] = @String -- 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters'
) -- end [cte_SourceData]
, [cte_Parse1] as (
select [Emp_ID], [Work Date]
     , [Clocks Position] = charindex('clocks ', [String]) + len('clocks')
     , [String] 
  from [cte_SourceData] 
)  -- end [cte_Parse1]
, [cte_Parse2] as (
select [Emp_ID], [Work Date] , [Clocks Position]
     , [String From Clock Observations] = substring([String],[Clocks Position]+1, len([String]))
     , [String]
  from [cte_Parse1]
)  -- end [cte_Parse2]
, [cte_Parse3] as (
select [Emp_ID], [Work Date]
     , [Clock Observations] = substring([String From Clock Observations],1, charindex(' ',[String From Clock Observations])-1)
     , [Clocks Position], [String From Clock Observations]
     , [Radius Verbosity Position Pre Ends] = charindex(' are outside of the allowed radius by ', [String From Clock Observations]) + len(' are outside of the allowed radius by ')
         --- Assumes no spaces between Clock Observations
     , [String]
  from [cte_Parse2]
) -- end [cte_Parse3]
, [cte_Parse4] as (
select [Emp_ID], [Work Date], [Clock Observations]
     , [String From Meters] = substring([String From Clock Observations],[Radius Verbosity Position Pre Ends]+1,len([String From Clock Observations]))
     , [Clocks Position] , [String From Clock Observations], [String]
  from [cte_Parse3]
) -- end [cte_Parse4]
, [cte_ReadyForDelimiting] as (
select [Emp_ID], [Work Date], [Clock Observations]
     , [Radius Observations] = substring([String From Meters],1, charindex(' meters', [String From Meters]))
--     , [String From Meters]     , [Clocks Position] , [String From Clock Observations], [String]
  from [cte_Parse4]
) -- end [cte_ReadyForDelimiting] 
, [cte_CountNumberOfSubColumns] as (
select [rdy].[Emp_ID], [rdy].[Work Date], [rdy].[Clock Observations], [rdy].[Radius Observations]
     , [Clock Observation Count] =  (len([rdy].[Clock Observations]) - len(replace([rdy].[Clock Observations],',',''))) / (len([rdy].[Clock Observations]) - len(replace([rdy].[Clock Observations],';','')) + 1) + 1
     , [Radius Observation Count] =  (len([Radius Observations]) - len(replace([Radius Observations],',',''))) / (len([Radius Observations]) - len(replace([Radius Observations],';','')) + 1) + 1
  from [cte_ReadyForDelimiting] [rdy]
) -- end [cte_CheckNumberOfColumns] 
, [cte_ParsingClockObservations] as (
select [src].[Emp_ID], [src].[Work Date], [src].[Clock Observations], [src].[Radius Observations]
       ,  [src].[Clock Observation Count] , [src].[Radius Observation Count]
       , [Clock Record Sequence] = [Times].[ItemNumber]
       , [Clock] = [Times].[Item]
from [cte_CountNumberOfSubColumns] [src]
  outer apply (select * from [tempDB].[dbo].[SplitAndSequence] ([src].[Clock Observations],','))  [Times]
) -- [cte_ParsingClockObservations] 
, [cte_ParsingRadiusObservations] as (
select [src].[Emp_ID], [src].[Work Date], [src].[Clock Observations], [src].[Radius Observations]
       ,  [src].[Clock Observation Count] , [src].[Radius Observation Count]
       , [Radius Record Sequence] = [Radius].[ItemNumber]
       , [Radius] = [Radius].[Item]
  from [cte_CountNumberOfSubColumns] [src]
  outer apply (select * from [tempDB].[dbo].[SplitAndSequence] ([src].[Radius Observations],','))  [Radius] --  on [Radius].[ItemNumber] = [Times].[ItemNumber]  */
 ) -- end [cte_ParsingRadiusObservations]
, [cte_TransposedRecords] as (
select [Emp_ID] = cast(isnull([times].[Emp_ID],[radius].[Emp_ID]) as varchar(10))
      , [Work Date] = isnull([times].[Work Date],[radius].[Work Date])
      , [Clock] = cast(case when isnull([times].[Clock],'N/A') in ('') then 'N/A' else isnull([times].[Clock],'N/A') end as varchar(10))
      , [Radius] = cast(case when isnull([radius].[Radius], 'N/A') in ('') then 'N/A' else isnull([radius].[Radius],'N/A') end as varchar(20))
      , [Stub] = 'StubStubStub'
      , [times].[Clock Record Sequence]
      , [radius].[Radius Record Sequence]
      , [times].[Clock Observations]
      , [times].[Radius Observations]
      , [times].[Clock Observation Count]
      , [times].[Radius Observation Count]
  from [cte_ParsingClockObservations] [times]
  full outer join [cte_ParsingRadiusObservations] [radius] on [radius].[Emp_ID] = [times].[Emp_ID] and [radius].[Work Date] = [times].[Work Date] and [radius].[Radius Record Sequence] = [times].[Clock Record Sequence] 
)
-- Original
select * from [cte_TransposedRecords]

用这行代码运行测试(使用原始查询数据集)

select * from [fn_ParseAndReturnObservations]
   ('1234', 
   '12/10/2020', 
   'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters')



Emp_ID     Work Date  Clock      Radius              
---------- ---------- ---------- --------------------
1234       2020-10-12 12:03      209759              
1234       2020-10-12 12:04      209758              

第 4 步测试用例 - 和示例用法 用时钟时间和半径重新替换您要分析的生产表的测试用例的最后三行

with cte_SampleTests as (
    select TestCase = 'Original Case with a break down', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters'
  union  select TestCase = 'Three Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05 are outside of the allowed radius by 209759,209758,209757 meters'
  union select TestCase = 'Forth Clock Position Listed but no value', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05, are outside of the allowed radius by 209759,209758,209757,209755 meters' 
  union select TestCase = 'Four Radius and Three Clock Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05 are outside of the allowed radius by 209759,209758,209757,209755 meters'
  union select TestCase = 'Four Clock Readings and 3 Radius Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,209757 meters'
  union select TestCase = 'Four Clock Readings and Fourth Radius Reading missing', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,209757, meters'
  union select TestCase = 'Four Clock Readings and third Radius Reading missing', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,,209757 meters'
)
select cte_SampleTests.[TestCase], x.*
      from [cte_SampleTests]
      cross apply (select * from [dbo].[fn_ParseAndReturnObservations]([cte_SampleTests].[Test_Emp_ID], [cte_SampleTests].[Test_WorkDateStr], [cte_SampleTests].[Test_String])) x

TestCase                                              Emp_ID     Work Date  Clock      Radius            
----------------------------------------------------- ---------- ---------- ---------- ------------------
Original Case with a break down                       1234       2020-10-12 12:03      209759            
Original Case with a break down                       1234       2020-10-12 12:04      209758            
Three Readings                                        1234       2020-10-12 12:03      209759            
Three Readings                                        1234       2020-10-12 12:04      209758            
Three Readings                                        1234       2020-10-12 12:05      209757            
Forth Clock Position Listed but no value              1234       2020-10-12 12:03      209759            
Forth Clock Position Listed but no value              1234       2020-10-12 12:04      209758            
Forth Clock Position Listed but no value              1234       2020-10-12 12:05      209757            
Forth Clock Position Listed but no value              1234       2020-10-12 N/A        209755            
Four Radius and Three Clock Readings                  1234       2020-10-12 12:03      209759            
Four Radius and Three Clock Readings                  1234       2020-10-12 12:04      209758            
Four Radius and Three Clock Readings                  1234       2020-10-12 12:05      209757            
Four Radius and Three Clock Readings                  1234       2020-10-12 N/A        209755            
Four Clock Readings and 3 Radius Readings             1234       2020-10-12 12:03      209759            
Four Clock Readings and 3 Radius Readings             1234       2020-10-12 12:04      209758            
Four Clock Readings and 3 Radius Readings             1234       2020-10-12 12:05      209757            
Four Clock Readings and 3 Radius Readings             1234       2020-10-12 12:06      N/A               
Four Clock Readings and Fourth Radius Reading missing 1234       2020-10-12 12:03      209759            
Four Clock Readings and Fourth Radius Reading missing 1234       2020-10-12 12:04      209758            
Four Clock Readings and Fourth Radius Reading missing 1234       2020-10-12 12:05      209757            
Four Clock Readings and Fourth Radius Reading missing 1234       2020-10-12 12:06      N/A               
Four Clock Readings and third Radius Reading missing  1234       2020-10-12 12:03      209759            
Four Clock Readings and third Radius Reading missing  1234       2020-10-12 12:04      209758            
Four Clock Readings and third Radius Reading missing  1234       2020-10-12 12:05      N/A               
Four Clock Readings and third Radius Reading missing  1234       2020-10-12 12:06      209757            

第五步 - 修改主函数,隐藏“半径”列右侧的所有列 列存根和存根右侧的所有列都列出以显示“工作”-- 对下游分析和决策分析 [时钟记录序列] 和 [半径记录] 进行半径和时钟观察的计数可能很有用序列]

第 6 步 - 返回原始数据作者 ...并要求他们为您提供替代结构 - XML 应该不难

相关问题