在SSIS中解析非结构化文本文件并读取每一行以获取所需数据

时间:2017-06-12 18:43:12

标签: sql ssis

我正在研究SSIS,我正在使用复杂的非结构化TEXT文件,我必须通过创建SSIS包来解析文本文件,并在DataBase中获取所需的列数据。这是解析的最佳方法Textfile以及如何编写脚本来读取该Text文件中的每一行。我还很困惑我是否可以在不编写脚本的情况下读取TEXT文件的每一行?

文本文件数据中的必需列是DEVICEID,DATAVALUE和DATAUNITS:

这是TEXT文件:

    12/02/2015 09:47:44:745 SecureHARTPort version: 1.1.12.0.

    12/02/2015 09:47:44:745 Connecting and initialing Session to 
    67.40.65.181 Port:5094 Tcp
    12/02/2015 09:47:44:745 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 0 
    Status: 0x00
   TranId: 1, Data ByteCount: 5
   Data: 01 00 09 27 C0 

    12/02/2015 09:47:44:761 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 0 
   Status: 0x00
  TranId: 1, Data ByteCount: 5
  Data: 01 00 09 27 C0 
  12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
  Status: 0x00
 TranId: 2, Data ByteCount: 5
 Data: 02 80 00 00 82 

 12/02/2015 09:47:44:855 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 2, Data ByteCount: 29
 Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50 
 00 26 00 26 84 8E 

 Rx Cmd=0, Rsp code=0x00, Device Status=0x50
 Expansion Code=254
 Expanded Device Type=9806
 # Request Preambles=5
 Universal Comand Revision Level=7
 Transmitter HART Revision Level=5
 Software Revision=2
 Hardware Revision Level / Physical Signaling Code=14
 Flags=0C
 Device ID=748132
 Minimum # Response Preambles=5
 Max # of device variables=4
 Configuration Change Counter=1
 Extended Field Device Status=50
 Manufacturer's ID=38
 Private Label Distributor=38
 Device Profile=132

 12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 3, Data ByteCount: 9
  Data: 82 A6 4E 0B 6A 64 14 00 7B 

  12/02/2015 09:47:44:870 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
  Status: 0x00
  TranId: 3, Data ByteCount: 43
  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00 
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 

 Rx Cmd=20, Rsp code=0x00, Device Status=0x50
 Long Tag=wihartgw

  12/02/2015 09:47:44:870 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 9
 Data: 82 A6 4E 0B 6A 64 4A 00 25 

 12/02/2015 09:47:44:886 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 19
  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B 

  Rx Cmd=74, Rsp code=0x00, Device Status=0x50
 Max Num IO Cards=1
 Max Num Channels per IO Card=1
 Max Num Sub-Devices per Channel=101
  Num Devices Detected=5
  Max Num DR Supported=2
  Master Mode for Comm=1
   Retry Count for Sub-Device=3

   Rx Cmd=9, Rsp code=0x00, Device Status=0x50
   Extended Device Status=0
   Slot0 Var Code=246
   Slot0 Var Classification=0
   Slot0 Var Units=251
   Slot0 Var Value=4
   Slot0 Var Status=C0
   Slot1 Var Code=116
  Slot1 Var Classification=209
  Slot1 Var Units=70
 Slot1 Var Value=0

2 个答案:

答案 0 :(得分:2)

不知道这是否对您有所帮助,但使用如下所示的T-SQL脚本,您可以先读取逐行文本,然后使用适当的过滤器:

DECLARE @YourText NVARCHAR(MAX)=
N'    12/02/2015 09:47:44:745 SecureHARTPort version: 1.1.12.0.

    12/02/2015 09:47:44:745 Connecting and initialing Session to 
    67.40.65.181 Port:5094 Tcp
    12/02/2015 09:47:44:745 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 0 
    Status: 0x00
   TranId: 1, Data ByteCount: 5
   Data: 01 00 09 27 C0 

    12/02/2015 09:47:44:761 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 0 
   Status: 0x00
  TranId: 1, Data ByteCount: 5
  Data: 01 00 09 27 C0 
  12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
  Status: 0x00
 TranId: 2, Data ByteCount: 5
 Data: 02 80 00 00 82 

 12/02/2015 09:47:44:855 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 2, Data ByteCount: 29
 Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50 
 00 26 00 26 84 8E 

 Rx Cmd=0, Rsp code=0x00, Device Status=0x50
 Expansion Code=254
 Expanded Device Type=9806
 # Request Preambles=5
 Universal Comand Revision Level=7
 Transmitter HART Revision Level=5
 Software Revision=2
 Hardware Revision Level / Physical Signaling Code=14
 Flags=0C
 Device ID=748132
 Minimum # Response Preambles=5
 Max # of device variables=4
 Configuration Change Counter=1
 Extended Field Device Status=50
 Manufacturer''s ID=38
 Private Label Distributor=38
 Device Profile=132

 12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 3, Data ByteCount: 9
  Data: 82 A6 4E 0B 6A 64 14 00 7B 

  12/02/2015 09:47:44:870 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
  Status: 0x00
  TranId: 3, Data ByteCount: 43
  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00 
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 

 Rx Cmd=20, Rsp code=0x00, Device Status=0x50
 Long Tag=wihartgw

  12/02/2015 09:47:44:870 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 9
 Data: 82 A6 4E 0B 6A 64 4A 00 25 

 12/02/2015 09:47:44:886 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 19
  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B 

  Rx Cmd=74, Rsp code=0x00, Device Status=0x50
 Max Num IO Cards=1
 Max Num Channels per IO Card=1
 Max Num Sub-Devices per Channel=101
  Num Devices Detected=5
  Max Num DR Supported=2
  Master Mode for Comm=1
   Retry Count for Sub-Device=3

   Rx Cmd=9, Rsp code=0x00, Device Status=0x50
   Extended Device Status=0
   Slot0 Var Code=246
   Slot0 Var Classification=0
   Slot0 Var Units=251
   Slot0 Var Value=4
   Slot0 Var Status=C0
   Slot1 Var Code=116
  Slot1 Var Classification=209
  Slot1 Var Units=70
 Slot1 Var Value=0';

- 查询将以CHAR(13) and/or CHAR(10)

的任意组合剪切线条
 WITH LineByLine AS
 (
    SELECT  ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS LineNr
           ,LTRIM(RTRIM(x.value(N'(text())[1]',N'nvarchar(max)'))) AS Line
    FROM
    (
    SELECT CAST(N'<x>' + REPLACE((SELECT REPLACE(REPLACE(REPLACE(@YourText,NCHAR(10),NCHAR(13)),NCHAR(13)+NCHAR(13),NCHAR(13)),NCHAR(13),N'\nl') AS [*] FOR XML PATH('')),N'\nl',N'</x><x>')  + N'</x>'AS XML) AS Casted
    ) AS t
    CROSS APPLY Casted.nodes(N'/x[text()]') AS A(x)
 )
 SELECT LineNr,Line
 FROM LineByLine
 WHERE CHARINDEX('Device ID=',Line)>0
    OR CHARINDEX('Data:',Line)>0
    OR CHARINDEX('unit',Line)>0;

结果将是:

Nr  Line
7   Data: 01 00 09 27 C0
11  Data: 01 00 09 27 C0
15  Data: 02 80 00 00 82
19  Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50
30  Device ID=748132
41  Data: 82 A6 4E 0B 6A 64 14 00 7B
45  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00
52  Data: 82 A6 4E 0B 6A 64 4A 00 25
56  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B
69  Slot0 Var Units=251
74  Slot1 Var Units=70

你没有说明你的预期输出,你的文本中没有陈述的列名,所以这是猜测...希望它有帮助......

答案 1 :(得分:0)

您肯定需要使用脚本任务来处理此问题。

脚本任务可以使用文件系统对象获取对文件的引用并逐行读取,查找字符串,如:

Device ID=xxx
Value=xxx
Units=xxx

在每种情况下获得xxx的任何值,并将其插入数据库。