CSV到列,与基于行的数据连接,分析和输出 - 是否可以高效完成?

时间:2017-12-15 20:01:29

标签: join sql-server-2008-r2

我有一个复杂的SQL Server问题,我一直在努力解决,但我被卡住了,我希望我能得到一些帮助!

我有两个以不同格式存储的数据表,我需要将它们拼接在一起以创建指定的输出。更糟糕的是,其中一个表有一些关键数据存储在逗号分隔值中(我知道这不是数据应该存储的方式 - 怜悯,我没有设计这些表!)。

学生表:

| id |              oldSkill |                             newSkill |
+----+-----------------------+--------------------------------------+
|  1 |                  Word |                Excel,PowerPoint,Word |
|  2 | Excel,PowerPoint,Word |        Excel,Outlook,PowerPoint,Word |
|  3 |       PowerPoint,Word |                Excel,PowerPoint,Word |
|  4 |          Access,Excel | Access,Excel,Outlook,PowerPoint,Word |
|  5 |          Outlook,Word |        Excel,Outlook,PowerPoint,Word |

技能表:

| id |      skill | assignment |
+----+------------+------------+
|  1 |       Word |          B |
|  1 |       Word |          P |
|  2 |      Excel |          P |
|  2 | PowerPoint |          B |
|  2 | PowerPoint |          P |
|  2 |       Word |          P |
|  3 | PowerPoint |          P |
|  3 |       Word |          P |
|  4 |     Access |          B |
|  4 |      Excel |          B |
|  4 |     Access |          P |
|  4 |      Excel |          P |
|  5 |    Outlook |          P |
|  5 |       Word |          B |

以下是我被要求输出的内容:

| id | skill_1 | skill_1_primary | skill_1_backup |    skill_2 | skill_2_primary | skill_2_backup |    skill_3 | skill_3_primary | skill_3_backup |    skill_4 | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
|----|---------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|------------|-----------------|----------------|---------|-----------------|----------------|
|  1 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |              Y |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  2 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |              Y |       Word |               Y |         (null) |  (null) |          (null) |         (null) |
|  3 |   Excel |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |               Y |         (null) |     (null) |          (null) |         (null) |  (null) |          (null) |         (null) |
|  4 |  Access |               Y |              Y |      Excel |               Y |              Y |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |    Word |               Y |         (null) |
|  5 |   Excel |               Y |         (null) |    Outlook |               Y |         (null) | PowerPoint |               Y |         (null) |       Word |          (null) |              Y |  (null) |          (null) |         (null) |

为了打破它,我需要:

  • 输出newSkill表格中Students列中的所有项目。这些值需要分成单独的列,每个列都有一个相应的标志,以指示技能是主要技术还是备用技能。请注意,newSkill列包含oldSkill

  • 如果技能较旧,请从Skills表中获取标志值,其中P为主要,B为备份

  • 如果技能是新功能,只需将Primary列标记为“y”值

我一直试图从不同的角度(CTE,枢轴,游标等)来看待这个问题,并且我已成功使用UDF将CSV列值分开,但是从{{抓取数据1}}表的行并将它们组合成他们想要的格式,以及Skills数据,正在逃避我。

我还设置了一个SQL小提琴来为这篇文章构建我的测试数据:http://sqlfiddle.com/#!6/e8d5a/1/0

提前感谢任何帮助或指导... SQL不是我最强大的技能之一。我可以用另一种语言更容易地做到这一点,但我被要求将其构建为存储过程。 = P

更新 根据评论中发布的建议,我对此非常了解。我只需要最终输出的帮助。我认为可以使用带有动态sql的数据透视表来完成,但是如何透视和聚合这三个与技能相关的列并让它们按照指定的方式进行编号就是逃避我。

Student

Example on RexTester

我在使用临时表来处理SQL Fiddle时遇到了问题,所以我将测试代码移到了RexTester。

在我的实际代码中,我使用DelimitedSplit8K来解析-- this pivots the skills table into a single row for each skill select * into #skillPiv from ( select id, skill, assignment, 'assignment_'+cast(row_number() over(partition by id, skill order by skill) as varchar(10)) rn from skills ) d pivot ( max(assignment) for rn in ([assignment_1], [assignment_2]) ) piv order by id; -- this converts the student's oldSkills from CSV into rows and looks up the corresponding skill assignments in the #skills table with st(id, skill, oldSkill) as ( select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1), STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '') from students union all select id, LEFT(CAST(oldSkill as varchar(max)), CHARINDEX(',',oldSkill+',')-1), STUFF(CAST(oldSkill as varchar(max)), 1, CHARINDEX(',',oldSkill+','), '') from st where oldSkill > '' ) select st.id ,st.skill ,CASE WHEN sp.assignment_1 = 'P' OR sp.assignment_2 = 'P' THEN 'Y' ELSE '' END AS [primary] ,CASE WHEN sp.assignment_1 = 'B' OR sp.assignment_2 = 'B' THEN 'Y' ELSE '' END AS [backup] into #oldSkills from st inner join #skillPiv sp on st.id = sp.id and st.skill = sp.skill order by id; -- convert the newSkills column from CSV to rows and insert our default skill assignment values with tmp(id, skill, newSkill) as ( select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1), STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '') from students union all select id, LEFT(CAST(newSkill as varchar(max)), CHARINDEX(',',newSkill+',')-1), STUFF(CAST(newSkill as varchar(max)), 1, CHARINDEX(',',newSkill+','), '') from tmp where newSkill > '' ) select id ,skill ,'Y' as [primary] ,'' as [backup] into #newSkills from tmp where skill NOT IN ( select skill from #oldSkills where id = tmp.id ) order by id; -- now combine #oldSkills and #newSkills into one table that has all the values we need select * into #studentSkills from ( select * from #newSkills UNION select * from #oldSkills ) as ss; select * from #studentSkills; 表中的CSV值。

上面的代码生成了这个最终表:

Students

现在我只需将它转动为所需的输出:

| id |      skill | primary | backup |
|----|------------|---------|--------|
|  1 |      Excel |       Y | (null) |
|  1 | PowerPoint |       Y | (null) |
|  1 |       Word |       Y |      Y |
|  2 |      Excel |       Y | (null) |
|  2 |    Outlook |       Y | (null) |
|  2 | PowerPoint |       Y |      Y |
|  2 |       Word |       Y | (null) |
|  3 |      Excel |       Y | (null) |
|  3 | PowerPoint |       Y | (null) |
|  3 |       Word |       Y | (null) |
|  4 |     Access |       Y |      Y |
|  4 |      Excel |       Y |      Y |
|  4 |    Outlook |       Y | (null) |
|  4 | PowerPoint |       Y | (null) |
|  4 |       Word |       Y | (null) |
|  5 |      Excel |       Y | (null) |
|  5 |    Outlook |       Y | (null) |
|  5 | PowerPoint |       Y | (null) |
|  5 |       Word |  (null) |      Y |

我感谢任何帮助。谢谢!

1 个答案:

答案 0 :(得分:3)

这个设计真的非常非常非常糟糕:-D

然而,如果你必须坚持下去,你可以试试这个:

注意:我依赖您的陈述

  

请注意,newSkill列包含oldSkill值

我认为" 没有旧技能,这不包括在新技能中!"

该解决方案完全内联并基于集合:

DECLARE @students TABLE(id INT,oldSkill VARCHAR(100),newSkill VARCHAR(100));
INSERT INTO @students VALUES
 (1,'Word','Excel,PowerPoint,Word')
,(2,'Excel,PowerPoint,Word','Excel,Outlook,PowerPoint,Word')
,(3,'PowerPoint,Word','Excel,PowerPoint,Word')
,(4,'Access,Excel','Access,Excel,Outlook,PowerPoint,Word')
,(5,'Outlook,Word','Excel,Outlook,PowerPoint,Word');

DECLARE @skills TABLE(id INT, skill VARCHAR(100),assignment VARCHAR(1));
INSERT INTO @skills VALUES
 (1,'Word','B')
,(1,'Word','P')
,(2,'Excel','P')
,(2,'PowerPoint','B')
,(2,'PowerPoint','P')
,(2,'Word','P')
,(3,'PowerPoint','P')
,(3,'Word','P')
,(4,'Access','B')
,(4,'Excel','B')
,(4,'Access','P')
,(4,'Excel','P')
,(5,'Outlook','P')
,(5,'Word','B');

- 第一个CTE将使用XML技巧来分割逗号分隔值

WITH Step1 AS
(
    SELECT id
          ,A.*     
    FROM @students AS s
    OUTER APPLY(
                 SELECT CAST('<x>' + REPLACE(s.oldSkill,',','</x><x>') + '</x>' AS XML) AS OldSkillXml
                       ,CAST('<x>' + REPLACE(s.newSkill,',','</x><x>') + '</x>' AS XML) AS NewSkillXml
                ) AS A
)

- 第二个CTE获得旧技能列表和标志

,OldSkills AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Step1.id ORDER BY (SELECT NULL)) AS OldSkillOrder
          ,Step1.id
          ,os.value('text()[1]','varchar(100)') AS Skill
          ,CASE WHEN (SELECT assignment FROM @skills AS s WHERE s.id=Step1.id AND s.skill=os.value('text()[1]','varchar(100)') AND s.assignment='P') IS NOT NULL THEN 'Y' END AS IsPrimary
          ,CASE WHEN (SELECT assignment FROM @skills AS s WHERE s.id=Step1.id AND s.skill=os.value('text()[1]','varchar(100)') AND s.assignment='B') IS NOT NULL THEN 'Y' END AS IsBackup
    FROM Step1 
    OUTER APPLY Step1.OldSkillXml.nodes('x') AS A(os)
)

- 此CTE获取新技能列表,全部标记为&#34; IsPrimary =&#39; Y&#39;&#34;

,NewSkills AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Step1.id ORDER BY (SELECT NULL)) AS NewSkillOrder
          ,Step1.id
          ,ns.value('text()[1]','varchar(100)') AS Skill
          ,'Y' AS IsPrimary
          ,NULL AS IsBackup
    FROM Step1 
    OUTER APPLY Step1.NewSkillXml.nodes('x') AS A(ns)
)

- 中间列表是数据透视之前的结果

,IntermediateList AS
(
    SELECT ns.id
          ,ns.Skill
          ,ns.IsPrimary
          ,os.IsBackup
          ,ns.NewSkillOrder
    FROM NewSkills AS ns
    FULL OUTER JOIN OldSkills AS os ON os.id=ns.id AND os.Skill=ns.Skill 
)

- 我在这里使用&#34;条件聚合&#34; (老式的支点),这对于有PIVOT多列的人来说非常棒

SELECT id

      ,MAX(CASE WHEN NewSkillOrder = 1 THEN Skill END) AS skill_1
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsPrimary END) AS skill_1_primary
      ,MAX(CASE WHEN NewSkillOrder = 1 THEN IsBackup END) AS skill_1_backup

      ,MAX(CASE WHEN NewSkillOrder = 2 THEN Skill END) AS skill_2
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsPrimary END) AS skill_2_primary
      ,MAX(CASE WHEN NewSkillOrder = 2 THEN IsBackup END) AS skill_2_backup

      ,MAX(CASE WHEN NewSkillOrder = 3 THEN Skill END) AS skill_3
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsPrimary END) AS skill_3_primary
      ,MAX(CASE WHEN NewSkillOrder = 3 THEN IsBackup END) AS skill_3_backup

      ,MAX(CASE WHEN NewSkillOrder = 4 THEN Skill END) AS skill_4
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsPrimary END) AS skill_4_primary
      ,MAX(CASE WHEN NewSkillOrder = 4 THEN IsBackup END) AS skill_4_backup

      ,MAX(CASE WHEN NewSkillOrder = 5 THEN Skill END) AS skill_5
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsPrimary END) AS skill_5_primary
      ,MAX(CASE WHEN NewSkillOrder = 5 THEN IsBackup END) AS skill_5_backup
FROM IntermediateList AS il
GROUP BY id; 

结果

+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| id | skill_1 | skill_1_primary | skill_1_backup | skill_2    | skill_2_primary | skill_2_backup | skill_3    | skill_3_primary | skill_3_backup | skill_4    | skill_4_primary | skill_4_backup | skill_5 | skill_5_primary | skill_5_backup |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 1  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 2  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | Y              | Word       | Y               | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 3  | Excel   | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | NULL           | NULL       | NULL            | NULL           | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 4  | Access  | Y               | Y              | Excel      | Y               | Y              | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word    | Y               | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+
| 5  | Excel   | Y               | NULL           | Outlook    | Y               | NULL           | PowerPoint | Y               | NULL           | Word       | Y               | Y              | NULL    | NULL            | NULL           |
+----+---------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+------------+-----------------+----------------+---------+-----------------+----------------+

<强>注意
有一点不同:你的学生5已经 NULL / Y 使用了技能&#34; Word&#34;在那里我不明白,为什么这个技能,因为它包含在&#34;新技能&#34;不应该是&#34;主要&#34;。