使用SSIS,我正在导入一个.txt文件,这在大多数情况下是直接的。
正在导入的文件有一定数量的列到一个点,但有一个自由文本/注释字段,可以重复到未知长度,类似于下面。
"000001","J Smith","Red","Free text here"
"000002","A Ball","Blue","Free text here","but can","continue"
"000003","W White","Green","Free text here","but can","continue","indefinitely"
"000004","J Roley","Red","Free text here"
我理想的做法(在SSIS中)是将前三列保持为单列,但将任何自由文本列合并为一列。即合并/连接在“颜色”之后出现的任何内容。列。
因此,当我将其加载到SSMS表中时,它看起来像:
000001 | J Smith | Red | Free text here |
000002 | A Ball | Blue | Free text here but can continue |
000003 | W White | Green | Free text here but can continue indefinitely |
000004 | J Roley | Red | Free text here |
答案 0 :(得分:0)
我没有看到任何简单的解决方案。您可以尝试以下内容:
<强> 1。将完整的原始数据加载到临时表(没有任何分隔符):
<强>步骤:强>
delayValidation=True
和DFT retainSameConnection=True
参考this创建临时表并使用它。
<强> 2。创建T-SQL以分隔3列(如下所示)
with col1 as ( Select [Val], substring([Val], 1 ,charindex(',', [Val]) - 1) col1, len(substring([Val], 1 ,charindex(',', [Val]))) + 1 col1Len from #temp ), col2 as ( select [Val], col1, substring([Val], col1Len, charindex(',', [Val], col1Len) - col1Len) as col2, charindex(',', [Val], col1Len) + 1 col2Len from col1 ) select col1, col2, substring([Val], col2Len, 200) as col3 from col2
T-SQL输出:
col1 col2 col3
"000001" "J Smith" "Red","Free text here"
"000002" "A Ball" "Blue","Free text here","but can","continue"
"000003" "W White" "Green","Free text here","but can","continue","indefinitely"
第3。在不同的数据流任务中使用OLEDB源中的上述查询
根据您的要求替换双引号(“)。
答案 1 :(得分:0)
这是一项有趣的练习:
添加数据流
添加脚本组件(选择源)
将4列添加到输出ID,名称颜色,FreeText所有类型字符串
编辑脚本:
将以下命名空间粘贴到顶部:
using System.Text.RegularExpressions;
using System.Linq;
将以下代码粘贴到CreateNewOutputRows:
string strPath = @"a:\test.txt"; \\put your file path in here
var lines = System.IO.File.ReadAllLines(strPath);
foreach (string line in lines)
{
//Code I stole to read CSV
string delimeter = ",";
Regex rgx = new Regex(String.Format("(\"[^\"]*\"|[^{0}])+", delimeter));
var cols = rgx.Matches(line)
.Cast<Match>()
.Select(m => m.Value.Trim().Trim('"'))
.Where(v => !string.IsNullOrWhiteSpace(v));
//create a column counter
int ctr = 0;
Output0Buffer.AddRow();
//Preset FreeText to empty string
string FreeTextBuilder = String.Empty;
foreach( string col in cols)
{
switch (ctr)
{
case 0:
Output0Buffer.ID = col;
break;
case 1:
Output0Buffer.Name = col;
break;
case 2:
Output0Buffer.Color = col;
break;
default:
FreeTextBuilder += col + " ";
break;
}
ctr++;
}
Output0Buffer.FreeText = FreeTextBuilder.Trim();
}