我需要将叙述字段(自由文本)拆分为多行。格式目前遵循:
Case_Reference | Narrative
```````````````|`````````````````````````````````````
XXXX/XX-123456 | [Endless_Text up to ~50k characters]
在作为文本的叙述字段中,单个条目(当各种代理人对案例做了一些事情时)从输入日期开始,后跟两个空格(即'dd/mm/yyyy '
),日期的值随每个变化在同一领域内进入。
换句话说,在搜索更好的分隔符后,我可以使用的唯一一种是字符串格式,所以我需要在叙述文本中识别多个位置,其中格式(掩盖是一个更好的单词?)匹配'dd/mm/yyyy '
。
我可以识别出多次出现的一致字符串没有问题,但是它确定了我在寻找的地方:
'%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %'
PATINDEX
当然会返回第一次出现/位置,但据我所知,没有办法“修改”这个(即创建的函数)以便接收其余的我们可以使用CHARINDEX
(因为PATINDEX
没有起始位置参数)的出现/位置。
为了清楚起见,我不是在寻找代码来直接划分它,因为我需要进一步操作每个条目,所以它纯粹是我正在寻找的叙事文本中多次出现的字符串的位置。
非常感谢任何帮助。
为清楚起见,没有选择在导入前执行此操作,因此需要对此着陆数据执行此操作。
所需的输出将是
Case_Reference1 | 1st_Position_of_Delimiter_String
Case_Reference1 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 1st_Position_of_Delimiter_String
Case_Reference2 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 3rd_Position_of_Delimiter_String
答案 0 :(得分:3)
您可以使用递归CTE解决此问题
DECLARE @tbl TABLE (Case_Reference NVARCHAR(MAX),Narrative NVARCHAR(MAX));
INSERT INTO @tbl VALUES
(N'C1',N'01/02/2000 Some text with blanks 02/03/2000 More text 03/04/2000 An even more')
,(N'C2',N'01/02/2000 Test for C2 02/03/2000 One more for C2 03/04/2000 An even more 04/05/2000 Blah')
,(N'C3',N'01/02/2000 Test for C3 02/03/2000 One more for C3 03/04/2000 An even more')
;
WITH recCTE AS
(
SELECT 1 AS Step,Case_Reference,Narrative,CAST(1 AS BIGINT) AS StartsAt,NewPos.EndsAt+10 AS EndsAt,LEN(Narrative) AS MaxLen
,SUBSTRING(Narrative,NewPos.EndsAt+10+1,999999) AS RestString
FROM @tbl AS tbl
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(Narrative,12,9999999))) AS NewPos(EndsAt)
UNION ALL
SELECT r.Step+1,r.Case_Reference,r.Narrative,r.EndsAt+1,CASE WHEN NewPos.EndsAt>0 THEN r.EndsAt+NewPos.EndsAt+10 ELSE r.MaxLen END,r.MaxLen
,SUBSTRING(r.RestString,NewPos.EndsAt+10+1,999999)
FROM recCTE AS r
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(r.RestString,12,99999999))) AS NewPos(EndsAt)
WHERE r.EndsAt<r.MaxLen
)
SELECT Step,Case_Reference,StartsAt,EndsAt
,SUBSTRING(Narrative,StartsAt,EndsAt-StartsAt+1) AS OutputString
FROM recCTE
ORDER BY Case_Reference,Step
结果
+------+----------------+----------+--------+---------------------------------------+
| Step | Case_Reference | StartsAt | EndsAt | OutputString |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C1 | 1 | 38 | 01/02/2000 Some text with blanks |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C1 | 39 | 60 | 02/03/2000 More text |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C1 | 61 | 84 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C2 | 1 | 24 | 01/02/2000 Test for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C2 | 25 | 52 | 02/03/2000 One more for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C2 | 53 | 77 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 4 | C2 | 78 | 93 | 04/05/2000 Blah |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C3 | 1 | 24 | 01/02/2000 Test for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C3 | 25 | 52 | 02/03/2000 One more for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C3 | 53 | 76 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
答案 1 :(得分:1)
试试这个递归cte
declare @t table
(
caseref varchar(20),
narrative varchar(max)
)
insert into @t values('Case_Reference1', 'blah 10/11/2016 something 13/11/2016 something else');
insert into @t values('Case_Reference2', '11/11/2016 something 12/11/2016 something else 14/11/2016 something yet still');
insert into @t values('Case_Reference3', 'should find nothing');
with cte (caseref, pos, remainingstr) as
(
select caseref,
patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative),
substring(narrative, patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative) + 12, len(narrative) - 12 - patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative))
from @t
where patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative) > 0
union all
select caseref,
pos + 12 + patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr),
substring(remainingstr, patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr) + 12, len(remainingstr) - 12 - patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr))
from cte
where patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr) > 0
)
select caseref, pos
from cte
order by caseref, pos