我必须根据一列字符串中的数值将一行转换为多行
示例输入:
EmpId | work date | String
------+------------+--------------------------------------------------------
1234 | 12/10/2020 | The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters
示例输出:
Empid | Work Date | Clock | Radius
------+------------+-------+--------
1234 | 12/10/2020 | 12:03 | 209759
1234 | 12/10/2020 | 12:04 | 209758
根据字符串中必须拆分为两列和行的数字,可以有 n 个值。
请帮我解决这个问题 - 谢谢
答案 0 :(得分:0)
请尝试以下解决方案。
它很乱,但工作正常:
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (emp_id INT, work_date DATE, free_text NVARCHAR(MAX))
INSERT INTO @tbl (emp_id, work_date, free_text) VALUES
(1234, '12/10/2020',N'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = SPACE(1)
, @comma CHAR(1) = ',';
WITH rs AS
(
SELECT emp_id, work_date
, CAST('<root><r><![CDATA[' +
REPLACE(free_text COLLATE Czech_BIN2, @separator, ']]></r><r><![CDATA[') + ']]></r></root>' AS XML)
.query('
for $x in /root/r
where contains($x, sql:variable("@comma"))
return $x
') AS result
FROM @tbl
), clock AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS seq
, rs.*
, z.value AS clock
FROM rs
CROSS APPLY result.nodes('/r[1]') AS t(c)
CROSS APPLY STRING_SPLIT(c.value('.', 'VARCHAR(20)'), @comma) AS z
), Radius AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS seq
, rs.*
, z.value AS Radius
FROM rs
CROSS APPLY result.nodes('/r[2]') AS t(c)
CROSS APPLY STRING_SPLIT(c.value('.', 'VARCHAR(20)'), @comma) AS z
)
SELECT c.emp_id, c.work_date, c.clock, r.Radius
FROM clock AS c
INNER JOIN Radius AS r ON r.seq = c.seq
AND r.emp_id = c.emp_id;
输出
+--------+------------+-------+--------+
| emp_id | work_date | clock | Radius |
+--------+------------+-------+--------+
| 1234 | 2020-12-10 | 12:03 | 209759 |
| 1234 | 2020-12-10 | 12:04 | 209758 |
+--------+------------+-------+--------+
答案 1 :(得分:0)
另一个凌乱的 - 没有 XML - 但使用借用的“Tally”表解决方案 - 如果源数据量很高,这可能对性能很有用
感谢 Yitzhak Khabinsky,因为他的版本绝对有效,而且可能更易于维护 - 我的 XML 版本迷路了,所以去了一个计数表。请注意 Yitzhak 的解决方案,如果提供的半径读数比提供的时钟读数多 - 我们不会直接从结果中得知 - 整个记录被排除。
这是我的(也许是冗长的)解决方案
步骤 1) 建立一个 Tally 表
步骤 2) 构建一个表格函数,该函数将分隔值绑定并转换为具有 rownum 的表格
第 3 步 - 部署一个表格函数,相应地解析字符串并返回包含转换记录的表格
第 4 步测试用例 - 和示例用法
第五步 - 修改主函数,隐藏“半径”列右侧的所有列
步骤 1) 构建 Tally 表 来自https://www.sqlservercentral.com/articles/the-numbers-or-tally-table-what-it-is-and-how-it-replaces-a-loop-1
--=============================================================================
-- Setup Tally Table
--=============================================================================
USE TempDB --DB that everyone has where we can cause no harm
SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed
DECLARE @StartTime DATETIME --Timer to measure total duration
SET @StartTime = GETDATE() --Start the timer
--=============================================================================
-- Create and populate a Tally table
--=============================================================================
--===== Conditionally drop and create the table/Primary Key
IF OBJECT_ID('dbo.Tally') IS NOT NULL
DROP TABLE dbo.Tally
CREATE TABLE dbo.Tally
(N INT,
CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N))
--===== Create and preset a loop counter
DECLARE @Counter INT
SET @Counter = 1
--===== Populate the table using the loop and couner
WHILE @Counter <= 11000
BEGIN
INSERT INTO dbo.Tally
(N)
VALUES (@Counter)
SET @Counter = @Counter + 1
END
--===== Display the total duration
SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'
步骤 2) 构建一个表格函数,该函数将分隔值绑定并转换为具有 rownum 的表格 这将确保即使有数百个半径和时钟读数 - 每个都将快速转换为子表。
函数使用在步骤 1 中创建的计数表
create function [dbo].[SplitAndSequence] (@pString nvarchar(4000), @pDelimiter char(1)) returns table return
SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY t.N),
Item = SUBSTRING(@pString, t.N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, t.N) - t.N)
FROM TempDB.dbo.Tally t
WHERE t.N <= DATALENGTH(@pString)+1 --DATATLENGTH allows for trailing space delimiters
AND SUBSTRING(@pDelimiter + @pString, t.N, 1) = @pDelimiter
第 3 步 - 部署一个表格函数,相应地解析字符串并返回包含转换记录的表格
CREATE function [dbo].[fn_ParseAndReturnObservations] (@Emp_ID varchar(50), @WorkDateStr varchar(10), @String nvarchar(300)) returns table
-- select * from [fn_ParseAndReturnObservations]('1234', '12/10/2020', 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters')
return
with [cte_SourceData] as (
select [Emp_ID] = @Emp_Id -- '1234'
, [Work Date] = convert(date,@WorkDateStr,103) -- convert(date,'12/10/2020',103)
, [String] = @String -- 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters'
) -- end [cte_SourceData]
, [cte_Parse1] as (
select [Emp_ID], [Work Date]
, [Clocks Position] = charindex('clocks ', [String]) + len('clocks')
, [String]
from [cte_SourceData]
) -- end [cte_Parse1]
, [cte_Parse2] as (
select [Emp_ID], [Work Date] , [Clocks Position]
, [String From Clock Observations] = substring([String],[Clocks Position]+1, len([String]))
, [String]
from [cte_Parse1]
) -- end [cte_Parse2]
, [cte_Parse3] as (
select [Emp_ID], [Work Date]
, [Clock Observations] = substring([String From Clock Observations],1, charindex(' ',[String From Clock Observations])-1)
, [Clocks Position], [String From Clock Observations]
, [Radius Verbosity Position Pre Ends] = charindex(' are outside of the allowed radius by ', [String From Clock Observations]) + len(' are outside of the allowed radius by ')
--- Assumes no spaces between Clock Observations
, [String]
from [cte_Parse2]
) -- end [cte_Parse3]
, [cte_Parse4] as (
select [Emp_ID], [Work Date], [Clock Observations]
, [String From Meters] = substring([String From Clock Observations],[Radius Verbosity Position Pre Ends]+1,len([String From Clock Observations]))
, [Clocks Position] , [String From Clock Observations], [String]
from [cte_Parse3]
) -- end [cte_Parse4]
, [cte_ReadyForDelimiting] as (
select [Emp_ID], [Work Date], [Clock Observations]
, [Radius Observations] = substring([String From Meters],1, charindex(' meters', [String From Meters]))
-- , [String From Meters] , [Clocks Position] , [String From Clock Observations], [String]
from [cte_Parse4]
) -- end [cte_ReadyForDelimiting]
, [cte_CountNumberOfSubColumns] as (
select [rdy].[Emp_ID], [rdy].[Work Date], [rdy].[Clock Observations], [rdy].[Radius Observations]
, [Clock Observation Count] = (len([rdy].[Clock Observations]) - len(replace([rdy].[Clock Observations],',',''))) / (len([rdy].[Clock Observations]) - len(replace([rdy].[Clock Observations],';','')) + 1) + 1
, [Radius Observation Count] = (len([Radius Observations]) - len(replace([Radius Observations],',',''))) / (len([Radius Observations]) - len(replace([Radius Observations],';','')) + 1) + 1
from [cte_ReadyForDelimiting] [rdy]
) -- end [cte_CheckNumberOfColumns]
, [cte_ParsingClockObservations] as (
select [src].[Emp_ID], [src].[Work Date], [src].[Clock Observations], [src].[Radius Observations]
, [src].[Clock Observation Count] , [src].[Radius Observation Count]
, [Clock Record Sequence] = [Times].[ItemNumber]
, [Clock] = [Times].[Item]
from [cte_CountNumberOfSubColumns] [src]
outer apply (select * from [tempDB].[dbo].[SplitAndSequence] ([src].[Clock Observations],',')) [Times]
) -- [cte_ParsingClockObservations]
, [cte_ParsingRadiusObservations] as (
select [src].[Emp_ID], [src].[Work Date], [src].[Clock Observations], [src].[Radius Observations]
, [src].[Clock Observation Count] , [src].[Radius Observation Count]
, [Radius Record Sequence] = [Radius].[ItemNumber]
, [Radius] = [Radius].[Item]
from [cte_CountNumberOfSubColumns] [src]
outer apply (select * from [tempDB].[dbo].[SplitAndSequence] ([src].[Radius Observations],',')) [Radius] -- on [Radius].[ItemNumber] = [Times].[ItemNumber] */
) -- end [cte_ParsingRadiusObservations]
, [cte_TransposedRecords] as (
select [Emp_ID] = cast(isnull([times].[Emp_ID],[radius].[Emp_ID]) as varchar(10))
, [Work Date] = isnull([times].[Work Date],[radius].[Work Date])
, [Clock] = cast(case when isnull([times].[Clock],'N/A') in ('') then 'N/A' else isnull([times].[Clock],'N/A') end as varchar(10))
, [Radius] = cast(case when isnull([radius].[Radius], 'N/A') in ('') then 'N/A' else isnull([radius].[Radius],'N/A') end as varchar(20))
, [Stub] = 'StubStubStub'
, [times].[Clock Record Sequence]
, [radius].[Radius Record Sequence]
, [times].[Clock Observations]
, [times].[Radius Observations]
, [times].[Clock Observation Count]
, [times].[Radius Observation Count]
from [cte_ParsingClockObservations] [times]
full outer join [cte_ParsingRadiusObservations] [radius] on [radius].[Emp_ID] = [times].[Emp_ID] and [radius].[Work Date] = [times].[Work Date] and [radius].[Radius Record Sequence] = [times].[Clock Record Sequence]
)
-- Original
select * from [cte_TransposedRecords]
用这行代码运行测试(使用原始查询数据集)
select * from [fn_ParseAndReturnObservations]
('1234',
'12/10/2020',
'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters')
Emp_ID Work Date Clock Radius
---------- ---------- ---------- --------------------
1234 2020-10-12 12:03 209759
1234 2020-10-12 12:04 209758
第 4 步测试用例 - 和示例用法 用时钟时间和半径重新替换您要分析的生产表的测试用例的最后三行
with cte_SampleTests as (
select TestCase = 'Original Case with a break down', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04 are outside of the allowed radius by 209759,209758 meters'
union select TestCase = 'Three Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05 are outside of the allowed radius by 209759,209758,209757 meters'
union select TestCase = 'Forth Clock Position Listed but no value', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05, are outside of the allowed radius by 209759,209758,209757,209755 meters'
union select TestCase = 'Four Radius and Three Clock Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05 are outside of the allowed radius by 209759,209758,209757,209755 meters'
union select TestCase = 'Four Clock Readings and 3 Radius Readings', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,209757 meters'
union select TestCase = 'Four Clock Readings and Fourth Radius Reading missing', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,209757, meters'
union select TestCase = 'Four Clock Readings and third Radius Reading missing', [Test_Emp_ID] = '1234', [Test_WorkDateStr] = '12/10/2020', [Test_String] = 'The following clocks 12:03,12:04,12:05,12:06 are outside of the allowed radius by 209759,209758,,209757 meters'
)
select cte_SampleTests.[TestCase], x.*
from [cte_SampleTests]
cross apply (select * from [dbo].[fn_ParseAndReturnObservations]([cte_SampleTests].[Test_Emp_ID], [cte_SampleTests].[Test_WorkDateStr], [cte_SampleTests].[Test_String])) x
TestCase Emp_ID Work Date Clock Radius
----------------------------------------------------- ---------- ---------- ---------- ------------------
Original Case with a break down 1234 2020-10-12 12:03 209759
Original Case with a break down 1234 2020-10-12 12:04 209758
Three Readings 1234 2020-10-12 12:03 209759
Three Readings 1234 2020-10-12 12:04 209758
Three Readings 1234 2020-10-12 12:05 209757
Forth Clock Position Listed but no value 1234 2020-10-12 12:03 209759
Forth Clock Position Listed but no value 1234 2020-10-12 12:04 209758
Forth Clock Position Listed but no value 1234 2020-10-12 12:05 209757
Forth Clock Position Listed but no value 1234 2020-10-12 N/A 209755
Four Radius and Three Clock Readings 1234 2020-10-12 12:03 209759
Four Radius and Three Clock Readings 1234 2020-10-12 12:04 209758
Four Radius and Three Clock Readings 1234 2020-10-12 12:05 209757
Four Radius and Three Clock Readings 1234 2020-10-12 N/A 209755
Four Clock Readings and 3 Radius Readings 1234 2020-10-12 12:03 209759
Four Clock Readings and 3 Radius Readings 1234 2020-10-12 12:04 209758
Four Clock Readings and 3 Radius Readings 1234 2020-10-12 12:05 209757
Four Clock Readings and 3 Radius Readings 1234 2020-10-12 12:06 N/A
Four Clock Readings and Fourth Radius Reading missing 1234 2020-10-12 12:03 209759
Four Clock Readings and Fourth Radius Reading missing 1234 2020-10-12 12:04 209758
Four Clock Readings and Fourth Radius Reading missing 1234 2020-10-12 12:05 209757
Four Clock Readings and Fourth Radius Reading missing 1234 2020-10-12 12:06 N/A
Four Clock Readings and third Radius Reading missing 1234 2020-10-12 12:03 209759
Four Clock Readings and third Radius Reading missing 1234 2020-10-12 12:04 209758
Four Clock Readings and third Radius Reading missing 1234 2020-10-12 12:05 N/A
Four Clock Readings and third Radius Reading missing 1234 2020-10-12 12:06 209757
第五步 - 修改主函数,隐藏“半径”列右侧的所有列 列存根和存根右侧的所有列都列出以显示“工作”-- 对下游分析和决策分析 [时钟记录序列] 和 [半径记录] 进行半径和时钟观察的计数可能很有用序列]
第 6 步 - 返回原始数据作者 ...并要求他们为您提供替代结构 - XML 应该不难