目前我正在使用提供者构建的SQL Server DB。该DB具有通过其系统进行的呼叫进入的数据。存储数据的主表有7个字段。 1个字段是主键,然后是2个外键,一对数据时间戳,最后是一个大量的字段调用" SergmentLog"
在此字段中,数据来自非结构化。以下是数据的示例:
/20160219T154710.554-07/0?S=50&E=3512&CUTC=20160219T155235.662-07&1=100187177120160219&2=0&3=18823&4=user%20queue:icadmin&5=&6=Interact&7=|/20160219T154729.377-07/0?S=50&E=3504&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=81592&4=user%20queue:icadmin&5=&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20/%3e%0a&8=&9=2|/20160219T154850.970-07/0?S=50&E=3502&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=55&4=&5=workgroup%20queue:Central%20Ops%202&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20TransferredUser%3d%22Phoenix%20AZ%22%20/%3e%0a|/20160219T154851.025-07/0?S=50&E=3500&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=1048&4=&5=&6=Queue&7=%3cDetails%20IVRAppName%3d%22Central%20Ops%202%22%20/%3e%0a|/20160219T154852.073-07/0?S=50&E=3502&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=13344&4=&5=workgroup%20queue:Central%20Ops%202&6=Interact&7=|/20160219T154905.417-07/0?S=50&E=3504&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=26202&4=user%20queue:icadmin&5=workgroup%20queue:Central%20Ops%202&6=LocalDisconnect&7=&8=&9=5
我被告知的是每个" SegmentLog"可以有多个"事件",称为" E ="在SegmentLog字段中。每个活动都由" |"管道符号。但是在每个偶数之前,有一个来自服务器的数据时间戳,然后是一个SourceID(称为" S ="),然后是最终的EventID(被称为" E =")
在每个EventID之后(3500-3512之间的数字)将有1-9的属性编号(被叫" 1 ="," 2 ="等等)。
请记住,每个SegmentLog可能有多个事件具有相同的EventID,并且并非所有属性都会显示在每个EventID中(IE E = 3502可能只显示属性1-6,而E = 3503可能显示属性1- 9)将这些数据构建到表结构中的最佳方法是什么。我可用的工具是在视图或中间SSIS知识内构建复杂的搜索查询。
修改
我希望看到这样的数据。但包括所有属性:
DateTime Sequence EventID Attr1 Attr3
-------- -------- ------- ----- -----
/20160219T154710.554-07/0? s=50 &E=3512 &1=100187177120160219 &3=18823
/20160219T154729.377-07/0? S=50 &E=3504 &1=100187177120160219 &3=81592
/20160219T154850.970-07/0? S=50 &E=3502 &1=100187177120160219 &3=55
/20160219T154851.025-07/0? S=50 &E=3500 &1=100187177120160219 &3=1048
答案 0 :(得分:0)
好的,我认为这是你想要完成的事情。
为了测试这个,我将您的示例行添加到SQL Server表nvarchar(max)列:
if exists (select * from sysobjects where name='BigLongString' and xtype='U')
drop table dbo.BigLongString;
go
create table dbo.BigLongString
(
SegmentLog nvarchar(max)
);
go
insert into dbo.BigLongString (SegmentLog)
values ('/20160219T154710.554-07/0?S=50&E=3512&CUTC=20160219T155235.662-07&1=100187177120160219&2=0&3=18823&4=user%20queue:icadmin&5=&6=Interact&7=|/20160219T154729.377-07/0?S=50&E=3504&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=81592&4=user%20queue:icadmin&5=&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20/%3e%0a&8=&9=2|/20160219T154850.970-07/0?S=50&E=3502&CUTC=20160219T155235.663-07&1=100187177120160219&2=0&3=55&4=&5=workgroup%20queue:Central%20Ops%202&6=LocalTransfer&7=%3cDetails%20TransferringUser%3d%22ICadmin%20-%22%20TransferringInteractionId%3d%22100187177120160219%22%20TransferredInteractionId%3d%22100187177120160219%22%20TransferredUser%3d%22Phoenix%20AZ%22%20/%3e%0a|/20160219T154851.025-07/0?S=50&E=3500&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=1048&4=&5=&6=Queue&7=%3cDetails%20IVRAppName%3d%22Central%20Ops%202%22%20/%3e%0a|/20160219T154852.073-07/0?S=50&E=3502&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=13344&4=&5=workgroup%20queue:Central%20Ops%202&6=Interact&7=|/20160219T154905.417-07/0?S=50&E=3504&CUTC=20160219T155235.664-07&1=100187177120160219&2=0&3=26202&4=user%20queue:icadmin&5=workgroup%20queue:Central%20Ops%202&6=LocalDisconnect&7=&8=&9=5')
go
然后我创建了一个SSIS包来提取这些数据并解析它。数据流任务如下所示:
OLE DB Source组件中的SQL语句是:
select
SegmentLog
from
dbo.BigLongString;
脚本组件是一个转换,具有异步输出:
如果展开“输出0”树,则可以看到添加的所有列。 Attr *列都是dt_wstr 500.我不确定它们有多大,所以你可能想要改变数据类型。我刚刚制作的其余列dt_wstr 50:
以下是脚本组件的代码。确保在退出之前构建:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.SqlServer.Dts.Pipeline;
#endregion
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
private PipelineBuffer inputBuffer;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//length of blob
int blobLen = 0;
//the bytes of the blob
byte[] webBlob = null;
string webStr = null;
string[] dateSplit = new string[] { "|" };
//get blob length. Hardcoded to 0 since we only look at one column
//in this example
blobLen = (int)inputBuffer.GetBlobLength(0);
//gets string from blob, hardcoded columnindex since we only have 1 column
webStr = ConvertBlobToString((byte[])inputBuffer.GetBlobData(0, 0, blobLen));
//holds value for dates in string
string[] dates = webStr.Split(dateSplit, StringSplitOptions.None);
//Loop through each date
foreach (string date in dates)
{
//Parse out each attribute for a given date
string[] attributes = date.Split('&');
Output0Buffer.AddRow();
//Loop through each attribute in date, you can remove the "&"+ if you do not need these in the values
for (int i = 0; i < attributes.Length; i++)
{
switch (i)
{
case 0:
Output0Buffer.DateTime = attributes[i].Substring(0, attributes[i].IndexOf('S'));
Output0Buffer.Sequence = attributes[i].Substring(attributes[i].IndexOf('S'), attributes[i].Length - attributes[i].IndexOf('S'));
break;
case 1:
Output0Buffer.EventID = "&" + attributes[i];
break;
case 2:
Output0Buffer.CUTC = "&" + attributes[i];
break;
case 3:
Output0Buffer.Attr1 = "&" + attributes[i];
break;
case 4:
Output0Buffer.Attr2 = "&" + attributes[i];
break;
case 5:
Output0Buffer.Attr3 = "&" + attributes[i];
break;
case 6:
Output0Buffer.Attr4 = "&" + attributes[i];
break;
case 7:
Output0Buffer.Attr5 = "&" + attributes[i];
break;
case 8:
Output0Buffer.Attr6 = "&" + attributes[i];
break;
case 9:
Output0Buffer.Attr7 = "&" + attributes[i];
break;
case 10:
Output0Buffer.Attr8 = "&" + attributes[i];
break;
case 11:
Output0Buffer.Attr9 = "&" + attributes[i];
break;
}
}
}
}
public override void ProcessInput(int InputID, Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer Buffer)
{
inputBuffer = Buffer;
base.ProcessInput(InputID, Buffer);
}
public string ConvertBlobToString(byte[] webBlob)
{
//string to return
string webStr = null;
//get string from blob
webStr = System.Text.Encoding.Unicode.GetString(webBlob);
return webStr;
}
}
运行包,您应该看到在数据查看器中按预期解析出的数据: