从Excel到SQL表的非结构化分组数据

时间:2019-01-26 19:56:19

标签: sql sql-server excel ssis etl

enter image description here

date    code    employee    type    total   actual
12-12-2018  ltsdysa 3026262 general 150 139
            typical 200 177
            dummy   120 100
14-12-2018  hskdbegs    3630878 new 300 143
            typical 100 99
        4829264 dummy   210 187
            general 130 89
            typical 200 200
18-12-2018  tuebdhxj    3637980 old 300 143
            typical 100 99
        4833390 general 300 260
            typical 130 89
            new 200 200

大家好,我想将Excel数据导入sql表。上图中给出了样本数据。 我需要将日期,代码,员工列数据的数据插入各自的类型列。

预期产量

enter image description here

您能帮我解决ssis任务吗? 就像我需要取消对前3列的分组。

2 个答案:

答案 0 :(得分:1)

这是您解决该问题的方法:

1。像这样在数据库中创建登台表。请记住,插入时需要具有“身份”列:

CREATE TABLE tbl_Excel_stg (
    ID INT IDENTITY(1,1),
    [Date] VARCHAR(50),
    code VARCHAR(50),
    Employee VARCHAR(50),
    [type] VARCHAR(50),
    total VARCHAR(50),
    actual VARCHAR(50)
)

2。构建SSIS包以将Excel数据加载到登台表中。

3。运行以下查询以提取更正的数据(为了测试目的,我已经包含了一些数据)

INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('2018-12-12','Itsdysa','3026262','general','150','139');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('','','','typica','200','177');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('','','','dummy','120','100');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('2018-12-14','hskdbegs','3630878','new','300','143');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('','','','typical','100','99');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('','','4829264','dummy','210','187');
GO
INSERT INTO tbl_Excel_stg ([Date],code,Employee,type,total,actual)
VALUES ('','','','general','130','89');
GO

SELECT [Date] = CASE s0.[Date] WHEN '' THEN s1.[Date] ELSE s0.[Date] END
    , code = CASE s0.code WHEN '' THEN s1.code ELSE s0.code END
    , Employee = CASE s0.Employee WHEN '' THEN s1.Employee ELSE s0.Employee END
    , s0.type, s0.total, s0.actual
FROM tbl_Excel_stg as s0
OUTER APPLY (
    SELECT ID = MAX(si.ID) FROM tbl_Excel_stg as si
    WHERE si.ID < s0.ID and si.[Date] != ''
) as sDate
OUTER APPLY (
    SELECT ID = MAX(si.ID) FROM tbl_Excel_stg as si
    WHERE si.ID < s0.ID and si.code != ''
) as scode
OUTER APPLY (
    SELECT ID = MAX(si.ID) FROM tbl_Excel_stg as si
    WHERE si.ID < s0.ID and si.Employee != ''
) as sEmployee
LEFT JOIN tbl_Excel_stg as s1 ON s1.ID = sDate.ID
LEFT JOIN tbl_Excel_stg as s2 ON s2.ID = scode.ID
LEFT JOIN tbl_Excel_stg as s3 ON s3.ID = sEmployee.ID
ORDER BY s0.ID

解决方案可能不是很漂亮,但是很容易理解/更改/调试

答案 1 :(得分:1)

在SSIS中,您可以使用脚本组件来实现:

  1. 添加DataFlow任务
  2. 在“数据流任务”中,添加Excel源,脚本组件和OLEDB目标
  3. 在脚本组件中,添加3个新的输出列(outdateoutcodeoutemployee),它们的数据类型与源列相同。
  4. 在“脚本”组件中,选择Visual Basic作为脚本语言
  5. 在脚本编辑器中,编写以下代码:

    Private Lastdate as string
    Private LastEmployee as string
    Private Lastcode as string
    
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    
    If Not Row.date_IsNull Then
    
        Row.outdate = Row.date
        Lastdate = Row.date
    Else
    
        Row.outdate = Lastdate
    
    End If
    
    If Not Row.employee_IsNull Then
    
        Row.outemployee = Row.employee
        Lastemployee = Row.employee
    Else
    
        Row.outemployee = Lastemployee
    
    End If
    
    If Not Row.code_IsNull Then
    
        Row.outcode = Row.code
        Lastcode = Row.code
    Else
    
        Row.outcode = Lastcode
    
    End If
    
    
    
    
    End Sub
    
  6. 将新创建的列以及typetotalactual列映射到相关的目标列