在SQL Server中拆分表

时间:2017-10-07 07:02:21

标签: sql sql-server database split data-integrity

我有一个名为学生的表,它存储了他们所参加的学生和培训的所有基本信息(该表有超过15列和超过5000条记录)。表格的一个示例部分是这样的:

St_id  St_Name  St_University  SoftSkillTraining  StartDate   EndDate     ComputerTraining  StartDate   EndDate 
---------------------------------------------------------------------------------------------------------------
 1      x        x             True               12/02/2017  12/03/2017  False             -           -       
 2      y        x             True               25/05/2016  25/06/2016  True              01/08/2017          

但是,表格没有规范化,我需要将学生表分成三个特定的表格(以多对多的关系形式)

  1. Student表包含学生的基本信息,如:
  2.  St_id    St_Name     St_University     St_Faculty
    --------------------------------------------------
    1        X           Some University    Law
    2        y           Some University    IT
    
    1. Training表存储'培训名称','开始日期'和'结束日期'列
    2. Training表应该是:

      TrainingId TrainingName  StartDate   EndDate     TrainingLocation
      -----------------------------------------------------------------
      1         SoftSkill      12/02/2017  12/03/2017   Some Location
      2         SoftSkill      25/02/2016  25/06/2016   Some Location
      3         CMOA           01/08/2017  01//09/2017  some location
      
      1. 存储培训参与者的intersection表,仅存储StudentTraining表的主键作为外键,如下所示:
      2. st_id      training_id
        -----------------------
        1           1
        2           2
        2           1
        

        如何将数据从student传输到Training表,因为您可以看到来自student表的不同列的数据应该显示为training表中使用存储的行程序?

2 个答案:

答案 0 :(得分:0)

实现任务的一种方法:

create table Students (
St_id int primary key,
St_Name varchar(5),  
St_University varchar(5),
SoftSkillTraining varchar(5), 
ST_StartDate  varchar(10),
ST_EndDate varchar(10),
ComputerTraining varchar(5),  
CT_StartDate varchar(10),
CT_EndDate varchar(10),
); 

insert into Students (St_id,  St_Name,  St_University,  SoftSkillTraining,  ST_StartDate ,  ST_EndDate, ComputerTraining, CT_StartDate,   CT_EndDate)
values('1','x', 'x' , 'True' , '12/02/2017', '12/03/2017' , 'False',NULL , NULL)

insert into Students (St_id,  St_Name,  St_University,  SoftSkillTraining,  ST_StartDate ,  ST_EndDate, ComputerTraining, CT_StartDate,   CT_EndDate)
values('2' , 'y' ,'x' , 'True' , '25/05/2016' ,  '25/06/2016' , 'True' , '01/08/2017', NULL)

create table Student (
St_id int primary key,
St_Name varchar(5),  
St_University varchar(5),
);

insert into Student (St_id, St_Name,St_University)
select distinct St_id , St_Name , St_University  from Students; 

create table Training (
Training_Id int identity(1,1) primary key,
Student_Id int foreign key references Students(St_id), 
Training_Name varchar(20),
StartDate  varchar(10),
EndDate varchar(10),
);

insert into Training (Student_Id ,Training_Name , StartDate, EndDate)
values ('1' , 'SoftSkillTraining' ,  '12/02/2017' , '12/03/2017' );

insert into Training (Student_Id ,Training_Name , StartDate, EndDate)
values ('2' , 'SoftSkillTraining' ,  '25/05/2016' , '25/06/2016' );

insert into Training (Student_Id ,Training_Name , StartDate, EndDate)
values ('2' , 'ComputerTraining' ,  '01/08/2017' , NULL );

create table Intersection (
Intersection_Id int  identity(1,1) primary key,
Student_id int foreign key references Students(St_id), 
Training_Id int foreign key references Training(Training_id),
);

insert into Intersection (Student_id,Training_Id)
select St_id, Training_Id   from Student join Training on St_id = Student_Id  


go
create view  Participants
as
select St_Name as Participant, Training_Name  from Intersection join Student on student_id = St_id  join Training on intersection.Training_Id = training.Training_Id   
go

答案 1 :(得分:0)

您需要执行相当多的任务,但规范化该表是正确的。在旧表的样本中,我注意到你有[StartDate]和amp; [EndDate]重复。 在SQL Sever中无法实现,所有列名在表中必须是唯一的。我希望这只是样本中的一个小问题,因为它非常重要。

下面我使用一种方法将学生行“取消”转换为多个较短的行,这代表了达到目标的临时步骤。此方法使用CROSS APPLYVALUES。请注意,您需要手动准备此VALUES部分,但您可以根据信息模式获取查询中的字段列表(此查询未提供)。

SQL Fiddle

查看此工作模型

MS SQL Server 2014架构设置

CREATE TABLE Student
    ([St_id] int, [St_Name] varchar(1), [St_University] varchar(1)
     , [SoftSkillTraining] varchar(4), [StartDate1] datetime, [EndDate1] datetime
     , [ComputerTraining] varchar(5), [StartDate2] datetime, [EndDate2] datetime)
;

INSERT INTO Student
    ([St_id], [St_Name], [St_University]
     , [SoftSkillTraining], [StartDate1], [EndDate1]
     , [ComputerTraining], [StartDate2], [EndDate2])
VALUES
    (1, 'x', 'x', 'True', '2017-02-12 00:00:00', '2017-03-12 00:00:00', 'False', NULL, NULL),
    (2, 'y', 'x', 'True', '2016-05-25 00:00:00', '2016-06-25 00:00:00', 'True', '2017-08-01', NULL)
;

这是最重要的查询它将源数据“展开”到多行

请注意,如何为每个培训课程分配ID,并且column groups(例如[SoftSkillTraining], [StartDate1], [EndDate1]}必须在值区域中逐行指定。这里的每一行都会产生一行新的输出,因此值区域的“布局”基本上决定了最终的输出。在这个区域,您需要仔细收集所有列名并准确排列。

select
    St_id, ca.TrainingId, ca.TrainingName, ca.isEnrolled, ca.StartDate, ca.EndDate
    into training_setup
from Student
cross apply (
  values
     (1, 'SoftSkillTraining', [SoftSkillTraining], [StartDate1], [EndDate1])
    ,(2, 'ComputerTraining', [ComputerTraining], [StartDate2], [EndDate2])
  ) ca (TrainingId,TrainingName,isEnrolled, StartDate,EndDate)
where ca.isEnrolled = 'True'
;

查询2

select
*
from training_setup

<强> Results

| St_id | TrainingId |      TrainingName | isEnrolled |            StartDate |              EndDate |
|-------|------------|-------------------|------------|----------------------|----------------------|
|     1 |          1 | SoftSkillTraining |       True | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
|     2 |          1 | SoftSkillTraining |       True | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
|     2 |          2 |  ComputerTraining |       True | 2017-08-01T00:00:00Z |               (null) |

查询3

-- this can be the basis for table [Training]
select distinct TrainingId,TrainingName, StartDate,EndDate
from training_setup

<强> Results

| TrainingId |      TrainingName |            StartDate |              EndDate |
|------------|-------------------|----------------------|----------------------|
|          1 | SoftSkillTraining | 2016-05-25T00:00:00Z | 2016-06-25T00:00:00Z |
|          1 | SoftSkillTraining | 2017-02-12T00:00:00Z | 2017-03-12T00:00:00Z |
|          2 |  ComputerTraining | 2017-08-01T00:00:00Z |               (null) |

注意我对此数据的一致性有所保留,请注意一门课程的开始/结束日期不同。我没有一个简单的解决方案。您可能需要清理数据以最大限度地减少差异和/或您可能需要一个额外的步骤,该步骤与我们在交叉申请中使用的ID以及开始/结束日期对匹配,以通过更新来获得更好的training_id版本在继续之前,training_setup登台表。

查询4

-- this can be the basis for table [Student_Training]
select St_id, TrainingId
from training_setup

<强> Results

| St_id | TrainingId |
|-------|------------|
|     1 |          1 |
|     2 |          1 |
|     2 |          2 |