我有一个包含以下条目的列的表 -
Drug
Sertraline 100mg tablets
Phenobarbitol 20mg capsules
我希望将此列拆分为四列 -
Drugname Strength Units Form
Sertraline 100 mg tablets
有人可以指导我这是谁做的吗?
答案 0 :(得分:3)
使用一点XML和CROSS APPLY
模式清晰,易于扩展或根据需要收缩
示例强>
Select A.*
,B.*
From YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
From (Select Cast('<x>' + replace((Select replace(A.[Drug],' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml) as xDim) as B1
) B
<强>返回强>
Pos1 Pos2 Pos3 Pos4 Pos5
Sertraline 100mg tablets NULL NULL
Phenobarbitol 20mg capsules NULL NULL
答案 1 :(得分:2)
还有一个建议:
第一个CTE将您的CSV字符串转换为XML,允许分别处理每个部分
第二个CTE检索这三个部分
最后的SELECT
使用一些字符串方法来分隔力量和单位。
DECLARE @tbl TABLE(Drug VARCHAR(100));
INSERT INTO @tbl VALUES('Sertraline 100mg tablets')
,('Phenobarbitol 20mg capsules');
WITH Splitted AS
(
SELECT CAST('<x>' + REPLACE((SELECT Drug AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML) AS Casted
FROM @tbl
)
,Parts AS
(
SELECT Casted.value('/x[1]/text()[1]','nvarchar(100)') AS Drugname
,Casted.value('/x[2]/text()[1]','nvarchar(100)') AS CombinedStrenthUnit
,Casted.value('/x[3]/text()[1]','nvarchar(100)') AS Form
FROM Splitted
)
SELECT *
,LEFT(CombinedStrenthUnit,PATINDEX('%[a-zA-Z]%',CombinedStrenthUnit)-1) AS Strength
,SUBSTRING(CombinedStrenthUnit,PATINDEX('%[a-zA-Z]%',CombinedStrenthUnit),1000) AS Unit
FROM Parts;
结果
Drugname S&U Form Strength Unit
Sertraline 100mg tablets 100 mg
Phenobarbitol 20mg capsules 20 mg
答案 2 :(得分:1)
我使用user-defined split function将文本拆分成3个由空格字符分隔的部分,如下所示
当然,如果你有SQL Server 2016或更高版本,那么你也可以使用STRING_SPLIT SQL函数
with rawdata as (
select rn = ROW_NUMBER() over (order by txt), * from drugs
), cte as (
select
rn,
d.txt,
s.id,
s.val
from rawdata d
cross apply dbo.Split(rtrim(ltrim(d.txt)),' ') s
)
select * from cte
请注意,需要Row_Number列来标识以下脚本中的每一行。如果源表中有PK字段,而不是使用Row_Number函数创建的字段,则可以直接使用这些主键字段
为了拆分第二列(强度和单位),我再次使用自定义SQL函数; ClearNumericCharacters和ClearNonNumericCharacters 当然,您可以使用内联函数或RegExp而不是UDF
这是最终的SQL CTE表达式
with rawdata as (
select rn = ROW_NUMBER() over (order by txt), * from drugs
), cte as (
select
rn,
d.txt,
s.id,
s.val
from rawdata d
cross apply dbo.Split(rtrim(ltrim(d.txt)),' ') s
), cte2 as (
select
rn,
case when id = 1 then val end as Drugname,
case when id = 2 then dbo.ClearNonNumericCharacters(val) end as Strength,
case when id = 2 then dbo.ClearNumericCharacters(val) end as Units,
case when id = 3 then val end as Form
from cte
)
select
max(Drugname) Drugname,
max(Strength) Strength,
max(Units) Units,
max(Form) Form
from cte2
group by rn
输出