解析文本文件,基于文本字符串标记对行进行分组

时间:2016-12-13 19:34:47

标签: tsql

我正在导入一个包含多个报告的大型文本文件'。每个报告都包含多行数据。我知道新报告何时开始的唯一方法是以" XX"开头。然后,所有后面的行都属于XX的主行。我正在尝试输入分组ID,以便我可以处理数据并将其解析到数据库中。

CREATE TABLE RawData(
    ID int IDENTITY(1,1) NOT NULL
    ,Grp1 int NULL
    ,Grp2 int NULL
    ,Rowdata varchar(max) NULL
)

INSERT INTO RawData(Rowdata) VALUES 'XX Monday'
INSERT INTO RawData(Rowdata) VALUES 'Tues day'
INSERT INTO RawData(Rowdata) VALUES 'We d ne s day'
INSERT INTO RawData(Rowdata) VALUES 'Thurs day'
INSERT INTO RawData(Rowdata) VALUES 'F r i d day'
INSERT INTO RawData(Rowdata) VALUES 'XX January'
INSERT INTO RawData(Rowdata) VALUES 'Feb r u a'
INSERT INTO RawData(Rowdata) VALUES 'XX Sun d a y'
INSERT INTO RawData(Rowdata) VALUES 'Sat ur day'

我需要编写一个脚本,根据" XX"的位置更新Grp1字段。线是在。当我完成后,我希望桌子看起来像这样:

ID   Grp1   Grp2   RowData
1    1      1      XX Monday
2    1      2      Tues day
3    1      3      We d ne s day
4    1      4      Thurs day
5    1      5      F r i d day
6    2      1      XX January
7    2      2      Feb r u a
8    3      1      XX Sun d a y
9    3      2      Sat ur day

我知道Grp2字段我可以使用DENSE_RANK。我遇到的问题是如何填写Grp1的所有值。我可以在我看到' XX'时进行更新,但这不会填写下面的值。

感谢您的任何建议/帮助。

2 个答案:

答案 0 :(得分:0)

这应该可以解决问题

-- sample data
DECLARE @RawData TABLE 
(
    ID int IDENTITY(1,1) NOT NULL
    ,Grp1 int NULL
    ,Grp2 int NULL
    ,Rowdata varchar(max) NULL
);
INSERT INTO @RawData(Rowdata) 
VALUES ('XX Monday'),('Tues day'),('We d ne s day'),('Thurs day'),('F r i d day'),
       ('XX January'),('Feb r u a'),('XX Sun d a y'),('Sat ur day');

-- solution
WITH rr AS
(
  SELECT ID, thisVal = ROW_NUMBER() OVER (ORDER BY ID)
  FROM @rawData
  WHERE RowData LIKE 'XX %'
),
makeGrp1 AS
(
  SELECT 
    ID,
    Grp1 = (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id),
    RowData
  FROM @rawData r
)
SELECT 
  ID,
  Grp1,
  Grp2 = ROW_NUMBER() OVER (PARTITION BY Grp1 ORDER BY ID),
  RowData
FROM makeGrp1;

更新: 下面是更新@RawData表的代码;我只是重新阅读了这个要求。我将离开原始解决方案,因为它将帮助您更好地理解我的更新是如何工作的:

-- sample data
DECLARE @RawData TABLE 
(
    ID int IDENTITY(1,1) NOT NULL
    ,Grp1 int NULL
    ,Grp2 int NULL
    ,Rowdata varchar(max) NULL
);
INSERT INTO @RawData(Rowdata) 
VALUES ('XX Monday'),('Tues day'),('We d ne s day'),('Thurs day'),('F r i d day'),
       ('XX January'),('Feb r u a'),('XX Sun d a y'),('Sat ur day');

-- Solution to update the @RawData Table
WITH rr AS
(
  SELECT ID, thisVal = ROW_NUMBER() OVER (ORDER BY ID)
  FROM @rawData
  WHERE RowData LIKE 'XX %'
),
makeGroups AS
(
  SELECT 
    ID,
    Grp1 = (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id),
    Grp2 = ROW_NUMBER() 
      OVER (PARTITION BY (SELECT MAX(thisVal) FROM rr WHERE r.id >= rr.id) ORDER BY ID)
  FROM @rawData r
)
UPDATE @RawData 
SET Grp1 = mg.Grp1, Grp2 = mg.Grp2
FROM makeGroups mg
JOIN @RawData rd ON mg.ID = rd.ID;

答案 1 :(得分:0)

;with cte0 as (
    Select *,Flag = case when RowData like 'XX%' then 1 else 0 end
    From   RawData )
Update RawData
   Set Grp1 = B.Grp1
      ,Grp2 = B.Grp2
 From RawData U
 Join (
        Select ID
              ,Grp1 = Sum(Flag) over (Order by ID)
              ,Grp2 = Row_Number() over (Partition By (Select Sum(Flag) From cte0 Where ID<=a.ID) Order by ID)
         From cte0 A
      ) B on U.ID=B.ID

Select * from RawData

更新的RawData看起来像这样

enter image description here