我需要使用各种定界符SQL Server将一个不规则的列拆分为许多子列。
我有一个名为Event_Name的列,其中包含以下不规则数据:
EVENT_NAME
----------------
ABBRV
Noun Noun2 Noun3 - Adjective - MM/DD/YYYY - LOCATION
Noun Noun2 - MM/DD/YYYY (#1) - LOCATION
Noun Noun2 - MM/DD/YYYY - Adjective (#1) - LOCATION
Noun, Noun1a Noun2 Noun3 - Adjective: MM/DD/YYYY - Adjective2 - LOCATION
like:
"QRCC"
"Pool Party Dance - Late Night - 12/12/2020 - North"
"Lawn Bowling - 12/12/2020 (#1) - South "
"Lawn, Pool Class Signups - Early: 12/12/2020 - Canceled - North"
"Pool Event - 11/31/2020 - To Be Announced (#1) - South"
我以前曾尝试使用Python解决此问题,但现在,按照结构化管道的方式,我确实需要在SQL查询中进行拆分,并以嵌入日期为条件。
在Python中,我正在做类似的事情
new= df['Event_Name'].str.split(" ",n=2, expand = True)
new[3] = new[2].str.split("-", expand= True)[1]
new[4] = new[2].str.split("-", expand= True)[2]
new[5] = new[2].str.split("-", expand= True)[3]
new[3] = new[3].str.split(' ()', expand=True)[2]
new[5]=new[2].str[-4:]
data = new[[0,1,3,5]]
0 | 1 | 3 | 5
-------------------------------
Noun | Noun2 | xx/xx/xxxx | LOCATION
但这并不说明形容词,也没有捕捉到在极少数情况下采用"Noun, Noun1a Noun2 Noun3 - Adjective: MM/DD/YYYY - Adjective - LOCATION"
格式的情况,输出应该是
0 | 1 | 3 | 5
----------------------------------------------
Noun, Noun1a | Noun2| xx/xx/xxxx| LOCATION
所以实际的期望输出将是
(Noun and Noun1a if not null) or ABBRV | Noun2 or null | Noun3 or null | DATE | Adjective or null| Adjective 2 or null| LOCATION
OR
Event Cat | Detail | Detail | DATE | Status | Status | LOCATION
-------------------------------------------------------------
QRCC
Pool | Party | Dance | 12/12/2020 | Late Night | | North
Lawn | Bowling | | 12/12/2020 | | | South
Lawn, Pool| Class | Signups| 12/12/2020 | Early |Canceled |North
Pool |Event | |11/31/2020 | To Be Announced | | South
偶然的"(#1)"
是无关紧要的,可以忽略。如何在SQL调用中执行此操作?
答案 0 :(得分:3)
说实话:这个设计太糟糕了。更改输入的任何机会都比处理此问题要好。这意味着:仅在无法更改获取数据的方式时,才可以使用我的建议来解析此内容。但是有时候我们不得不处理废话……
DECLARE @tbl TABLE(ID INT IDENTITY, YourString VARCHAR(1000));
INSERT INTO @tbl VALUES
('QRCC')
,('Pool Party Dance - Late Night - 12/12/2020 - North')
,('Lawn Bowling - 12/12/2020 (#1) - South')
,('Lawn, Pool Class Signups - Early: 12/12/2020 - Canceled - North')
,('Pool Event - 11/31/2020 - To Be Announced (#1) - South');
-查询
SELECT TheFirstFragment.value('/x[1]','nvarchar(max)') AS [Event Cat]
,TheFirstFragment.value('/x[2]','nvarchar(max)') AS [Detail]
,TheFirstFragment.value('/x[3]','nvarchar(max)') AS [Detail]
,AssumablySomeDate.value('/x[contains(.,"/")][1]','nvarchar(max)') AS HopefullyTheDate
,AssumablySomeDate.value('/x[not(contains(.,"/"))][1]','nvarchar(max)') AS [Status1]
,CASE WHEN TheThirdFragment NOT IN('North','South') AND TheThirdFragment NOT LIKE '%/%' THEN TheThirdFragment END AS [Status2]
,CASE WHEN YourStringAsXml.value('count(/x)','int')>2 THEN YourStringAsXml.value('/x[last()]','nvarchar(max)') END AS [LOCATION]
FROM @tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),' - ','</x><x>') + '</x>' AS XML)) A(YourStringAsXml)
OUTER APPLY(SELECT Cast('<x>' + REPLACE((SELECT REPLACE(YourStringAsXml.value('/x[1]','nvarchar(max)'),', ',',') AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML)) B(TheFirstFragment)
OUTER APPLY(SELECT CAST('<x>' + REPLACE((SELECT YourStringAsXml.value('/x[contains(.,"/")][1]','nvarchar(max)') AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML)) C(AssumablySomeDate)
OUTER APPLY(SELECT YourStringAsXml.value('/x[3]','nvarchar(max)')) AS D(TheThirdFragment);
简而言之
APPLY
将预先计算一些列
列列表
可能和应该的用法清楚显示,这随时可能会中断...