我有一个包含两列逗号分隔字符串的表。格式化数据的方式,两列中逗号分隔项的数量相等,colA中的第一个值与colB中的第一个值相关,依此类推。 (它显然不是一个非常好的数据格式,但它是我正在使用的。)
如果我有以下行(PrimaryKeyID | column1 | column2):
1 | a,b,c | A,B,C
然后以这种数据格式,& 1与逻辑相关,b& 2,等等。
我想使用STRING_SPLIT
来拆分这些列,但是使用它们两次显然会相互交叉,从而产生总共9行。
1 | a | A
1 | b | A
1 | c | A
1 | a | B
1 | b | B
1 | c | B
1 | a | C
1 | b | C
1 | c | C
我想要的只是3"逻辑相关"列
1 | a | A
1 | b | B
1 | c | C
但是,STRING_SPLIT(myCol,',')
似乎无法将字符串位置保存在任何位置。
我做了以下事情:
SELECT tbl.ID,
t1.Column1Value,
t2.Column2Value
FROM myTable tbl
INNER JOIN (
SELECT t.ID,
ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column1,',') ss
) t1 ON tbl.ID = t1.ID
INNER JOIN (
SELECT t.ID,
ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column2,',') ss
) t1 ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder
这似乎适用于我的小型测试装置,但在我看来,没有理由期望它每次都能得到保证。 ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
显然是无意义的排序,但似乎在没有任何实际排序的情况下,STRING_SPLIT返回"默认"中的值。命令他们已经在。这是"预期"行为?我可以指望这个吗?有没有其他方法可以完成我试图做的事情?
感谢。
======================
我使用以下UDF获得了我想要的(我认为)。然而,它很慢。有什么建议吗?
CREATE FUNCTION fn.f_StringSplit(@string VARCHAR(MAX),@delimiter VARCHAR(1))
RETURNS @r TABLE
(
Position INT,
String VARCHAR(255)
)
AS
BEGIN
DECLARE @current_position INT
SET @current_position = 1
WHILE CHARINDEX(@delimiter,@string) > 0 BEGIN
INSERT INTO @r (Position,String) VALUES (@current_position, SUBSTRING(@string,1,CHARINDEX(@delimiter,@string) - 1))
SET @current_position = @current_position + 1
SET @string = SUBSTRING(@string,CHARINDEX(@delimiter,@string) + 1, LEN(@string) - CHARINDEX(@delimiter,@string))
END
--add the last one
INSERT INTO @r (Position, String) VALUES(@current_position,@string)
RETURN
END
答案 0 :(得分:2)
您的想法很好,但您的order by
没有使用稳定的排序。我认为这样做更安全:
SELECT tbl.ID, t1.Column1Value, t2.Column2Value
FROM myTable tbl INNER JOIN
(SELECT t.ID, ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column1 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column1,',') ss
) t1
ON tbl.ID = t1.ID INNER JOIN
(SELECT t.ID, ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column2 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column2, ',') ss
) t2
ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder;
注意:如果字符串具有不相邻的重复项,则可能无法正常工作。
答案 1 :(得分:1)
我对这个问题有点晚了,但我只是尝试使用string_split来做同样的事情,因为我最近遇到了性能问题。我在T-SQL中使用字符串拆分器的经验使我使用递归CTE来处理包含少于1,000个分隔值的大多数事物。理想情况下,如果在字符串拆分中需要序数,则将使用CLR过程。
那就是说,我从string_split获得序数时得出了与你类似的结论。你可以看到下面的查询和统计信息,它们依次是bare string_split函数,string_split的CTE RowNumber,然后是我从这个awesome write-up派生的我的个人字符串拆分CTE函数。基于CTE的功能和写作功能之间的主要区别在于我将其设为Inline-TVF,而不是实现MultiStatement-TVF,您可以阅读差异here。
在我的实验中,我没有看到在一个常量返回分隔字符串的内部顺序时使用ROW_NUMBER的偏差,所以我将使用它直到我发现它有问题,但是如果订单是在商业环境中势在必行,我可能会推荐上面第一个链接中的Moden分割器,它链接到作者的文章here,因为它与安全性较低的性能一致。 string_split与RowNumber方法。
set nocount on;
declare
@iter int = 0,
@rowcount int,
@val varchar(max) = '';
while len(@val) < 1e6
select
@val += replicate(concat(@iter, ','), 8e3),
@iter += 1;
raiserror('Begin string_split Built-In', 0, 0) with nowait;
set statistics time, io on;
select
*
from
string_split(@val, ',')
where
[value] > '';
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin string_split Built-In with RowNumber', 0, 0) with nowait;
set statistics time, io on;
with cte
as (
select
*,
[group] = 1
from
string_split(@val, ',')
where
[value] > ''
),
cteCount
as (
select
*,
[id] = row_number() over (order by [group])
from
cte
)
select
*
from
cteCount;
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In with RowNumber | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin Moden String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.SplitStrings_Moden(@val, ',')
where
item > '';
select
@rowcount = @@rowcount;
set statistics time, io off;
print '';
raiserror('End Moden String Splitter | Return %d Rows', 0, 0, @rowcount) with nowait;
print '';
raiserror('Begin Recursive CTE String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.fn_splitByDelim(@val, ',')
where
strValue > ''
option
(maxrecursion 0);
select
@rowcount = @@rowcount;
set statistics time, io off;
统计数据
Begin string_split Built-In
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 5325 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In | Return 331940 Rows
Begin string_split Built-In with RowNumber
SQL Server Execution Times:
CPU time = 2094 ms, elapsed time = 8119 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In with RowNumber | Return 331940 Rows
Begin Moden String Splitter
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 8734 ms, elapsed time = 9009 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Moden String Splitter | Return 331940 Rows
Begin Recursive CTE String Splitter
Table 'Worktable'. Scan count 2, logical reads 1991648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 147188 ms, elapsed time = 147480 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Recursive CTE String Splitter | Return 331940 Rows
答案 2 :(得分:1)
SELECT
PrimaryKeyID ,t2.items as column1, t1.items as column2 from [YourTableName]
cross Apply [dbo].[Split](column2) as t1
cross Apply [dbo].[Split](column1) as t2
答案 3 :(得分:0)
我发现表达性地维护String_Split()
函数顺序的唯一方法是使用Row_Number()
函数,其文字值在“ order by”中。
例如:
declare @Version nvarchar(128)
set @Version = '1.2.3';
with V as (select value v, Row_Number() over (order by (select 0)) n from String_Split(@Version, '.'))
select
(select v from V where n = 1) Major,
(select v from V where n = 2) Minor,
(select v from V where n = 3) Revision
返回:
Major Minor Revision
----- ----- ---------
1 2 3
答案 4 :(得分:0)
马克,这是我要使用的解决方案。假设表中的[column 1]
的“键”值不稳定,并且[column2]
的对应“字段”值有时可以省略或为NULL:
将有两种提取方式,一种是[column 1]
-我认为是键,另一种是[column 2]
-我认为是“键”的“值”类型,则会通过STRING_SPLIT
函数对其进行自动解析。
然后将根据操作时间(始终是连续的)对这两个INDEPENDENT结果集重新编号。请注意,我们不是通过字段内容或逗号等位置来重新编号,而通过时间戳来重新编号。
然后他们将通过LEFT OUTER JOIN
重新加入在一起; 请注意,并非INNER JOIN
,因为我们的“字段值”可能会被忽略,而“键”将始终存在
下面是TSQL代码,因为这是我对此站点的第一篇文章,希望它看起来还可以:
SELECT T1.ID, T1.KeyValue, T2.FieldValue
from (select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS KeyRow, t2.value AS KeyValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column1,',') as t2) T1
LEFT OUTER JOIN
(select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS FieldRow, t3.value AS FieldValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column2,',') as t3) T2 ON T1.ID = T2.ID AND T1.KeyRow = T2.FieldRow