我正在尝试将我的数据规范化,因为它是从Excel工作表中输入的。我拉数据的文件有一堆列sibling1_name,sibling1_age,sibling1_affected等最多4个兄弟,4个孩子,4个亲戚等。我想把它全部输入到一个名字,年龄,受影响和关系的新表中。我找到了正确输入第一个兄弟姐妹的方法(见下文),但我不确定如何添加其他兄弟。有什么建议吗?
INSERT INTO Family
(ID,
Name,
Age,
Affected,
Relationship)
SELECT ExcelPatients.id,
ExcelPatients.sibling1_name AS Name,
ExcelPatients.sibling1_age AS Age,
ExcelPatients.sibling1_affected AS Affected,
"Sibling"
FROM ExcelPatients
WHERE (( ( ExcelPatients.Sibling1_name ) IS NOT NULL ))
AND ExcelPatients.id NOT IN (SELECT DISTINCT ID AND Name
FROM Family);
答案 0 :(得分:1)
INSERT INTO Family
(ID,
Name,
Age,
Affected,
Relationship)
SELECoT ExcelPatients.id, ExcelPatients.sibling1_name AS Name,
ExcelPatients.sibling1_age AS Age,
ExcelPatients.sibling1_affected AS Affected, "Sibling"
FROM ExcelPatients
WHERE (((ExcelPatients.Sibling1_name) Is Not Null))
AND NOT EXISTS (SELECT DISTINCT ID FROM Family where family.id = ExcelPatients.id and Family.name = ExcelPatients.sibling1_name)
UNION
SELECT ExcelPatients.id, ExcelPatients.sibling2_name AS Name,
ExcelPatients.sibling2_age AS Age,
ExcelPatients.sibling2_affected AS Affected, "Sibling"
FROM ExcelPatients
WHERE (((ExcelPatients.Sibling2_name) Is Not Null))
AND NOT EXISTS (SELECT DISTINCT ID FROM Family where family.id = ExcelPatients.id and Family.name = ExcelPatients.sibling2_name)
UNION
SELECT ExcelPatients.id, ExcelPatients.sibling3_name AS Name,
ExcelPatients.sibling3_age AS Age,
ExcelPatients.sibling3_affected AS Affected, "Sibling"
FROM ExcelPatients
WHERE (((ExcelPatients.Sibling3_name) Is Not Null))
AND NOT EXISTS (SELECT DISTINCT ID FROM Family where family.id = ExcelPatients.id and Family.name = ExcelPatients.sibling3_name)
UNION
SELECT ExcelPatients.id, ExcelPatients.sibling4_name AS Name,
ExcelPatients.sibling4_age AS Age,
ExcelPatients.sibling4_affected AS Affected, "Sibling"
FROM ExcelPatients
WHERE (((ExcelPatients.Sibling4_name) Is Not Null))
AND NOT EXISTS (SELECT DISTINCT ID FROM Family where family.id = ExcelPatients.id and Family.name = ExcelPatients.sibling4_name)
如果没有看到数据,我不知道UNION ALL或UNION是否是正确的选择。如果名称只能在4个兄弟列中的一个中,则使用UNION ALL,如果可以重复,则使用UNION UNION。由于您正在清理来自其他来源的数据,因此UNION可能更安全但选择更慢。 NOT EXISTS往往是SQL Server中最快的比较,这就是我选择它的原因。