我要实现的目标是按id
对它们进行分组,并为date
和data
创建一列。
数据集的背景是参与者参加的实验结果,并且由于禁食限制等原因,无法在同一天进行某些测试。我使用的数据库是SQL Server。
下面是我的数据集以及所需的输出。
样本数据集:
create table Sample
(
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100)
)
insert into Sample(Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01, '2017-11-21', '74', '6.4', '2.04', '4.17', '1.64', '6.1', '2.54')
insert into sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,K_HCR_1)
values (2,'2018-10-02','57','8.91','2.43','1.28','3.99','1.25','3.19')
insert into sample (Id,LAB_DATE,H_HBA1C_1)
values (2,'2018-10-03','8.6')
insert into sample (Id,LAB_DATE,J_GLUF_1)
values (2,'2018-10-04','7.8')
insert into sample (Id,LAB_DATE,A_CRE_1,B_GLUH_1,C_LDL_1,D_TG_1,E_CHOL_1,F_HDL_1,G_CRPH_1,H_HBA1C_1,K_HCR_1)
values (3,'2016-10-01','100','6.13','3.28','0.94','5.07','1.19','0.27','5.8','4.26')
所需的输出:
ID|LAB_DATE|A_CRE_1|B_GLUH_1|C_LDL_1|Date_TG_1|D_TG_1|E_CHOL_1|F_HDL_1|G_CRPH_1|H_HBA1C_1|Date_GLU120_1|I_GLU120_1|J_GLUF_1|K_HCR_1
1|2017-11-21|74|6.4|2.04|2017-11-23|0.56|4.17|1.64|||6.1|2017-11-22|8.8|||2.54
2|02/10/2018|57|8.91|2.43||1.28|3.99|1.25||03/10/2018|8.6|||04/10/2018|7.8|3.19
3|01/10/2016|100|6.13|3.28||0.94|5.07|1.19|0.27||5.8|||||4.26
答案 0 :(得分:1)
这是一个解决方案(无法处理具有相同ID /样本类型的多行-您尚未说明如何处理这些行)
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1 from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1 from sample) s2
ON s1.id = s2.id
(select Id, LAB_DATE as dtgdate, D_TG_1 from sample) s3
ON s1.id = s3.id
希望您能通过这种模式得到灵感;如果您还有其他带有日期的样本类型,则以类似的方式将它们从s1分解到自己的子查询中(例如,将e_chol_1设为s4,将k_hcr_1设为s5等)。请注意,如果有任何样本类型为缺少它将导致整个行从结果中消失。如果不希望这样做,并且您对缺失的样本接受NULL,请使用LEFT JOIN代替INNER
如果患者01有多个样本,而您只想要最新的样本,则模式变为:
select * from
(select Id, LAB_DATE,A_CRE_1, B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s1
INNER JOIN
(select Id, LAB_DATE as glu120date, I_GLU120_1,
row_number() over(partition by id order by lab_date desc) rn
from sample) s2
ON s1.id = s2.id and s1.rn = s2.rn
WHERE
s1.rn = 1
请注意,row_number() over(partition by id order by lab_date desc) rn
的增加-这将建立一个以降序排列的递增计数器(最新记录= 1,较早= 2 ...),对于每个不同的id,该计数器从1重新开始。我们也加入到该模型中,然后说出rn = 1,以便仅选择每种样本类型的最新记录
答案 1 :(得分:1)
正如@Ben所建议的,您可以使用group by id并为所有列(如下面的列)取min。
DECLARE @Sample as table (
Id int,
LAB_DATE date,
A_CRE_1 varchar(100),
B_GLUH_1 varchar(100),
C_LDL_1 varchar(100),
D_TG_1 varchar(100),
E_CHOL_1 varchar(100),
F_HDL_1 varchar(100),
G_CRPH_1 varchar(100),
H_HBA1C_1 varchar(100),
I_GLU120_1 varchar(100),
J_GLUF_1 varchar(100),
K_HCR_1 varchar(100))
insert into @Sample(Id, LAB_DATE,A_CRE_1,
B_GLUH_1,C_LDL_1,E_CHOL_1,F_HDL_1,H_HBA1C_1,K_HCR_1)
values (01,'2017-11-21','74','6.4','2.04','4.17','1.64','6.1','2.54')
insert into @Sample (Id, LAB_DATE, I_GLU120_1)
values (01, '2017-11-22','8.8')
insert into @Sample (Id, LAB_DATE, D_TG_1)
values (01, '2017-11-23','0.56')
SELECT s.Id
, MIN(s.LAB_DATE) AS LAB_DATE
, MIN(s.A_CRE_1) AS A_CRE_1
, MIN(s.B_GLUH_1) AS B_GLUH_1
, MIN(s.C_LDL_1) AS C_LDL_1
, MIN(s.D_TG_1) AS D_TG_1
, MIN(s.E_CHOL_1) AS E_CHOL_1
, MIN(s.F_HDL_1) AS F_HDL_1
, MIN(s.G_CRPH_1) AS G_CRPH_1
, MIN(s.H_HBA1C_1) AS H_HBA1C_1
, MIN(s.I_GLU120_1) AS I_GLU120_1
, MIN(s.J_GLUF_1) AS J_GLUF_1
, MIN(s.K_HCR_1) AS K_HCR_1
FROM @Sample AS s
GROUP BY s.Id
您还可以检查SQL Server STUFF函数。可以从以下链接获取帮助 https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-for-sql-server-data/
答案 2 :(得分:0)
根据我对呈现原始数据的评论,这是我认为您应该做的(接受您评论的查询)
SELECT
ID,
MAX(CASE WHEN TestID='1' THEN Results END) [Test_1],
MAX(CASE WHEN TestID='2' THEN Results END) [Test_2],
MAX(CASE WHEN TestID='1' THEN Result_Date_Time END) Test12Date,
MAX(CASE WHEN TestID='3' THEN Results END) [Test_3],
MAX(CASE WHEN TestID='3' THEN Result_Date_Time END) Test3Date
FROM [tbBloodSample]
GROUP BY ID
ORDER BY ID
注意:如果TestID是一个整数,则不要在查询中使用类似“ 1”的字符串,而应使用整数。在情况下,您不需要ELSE NULL-如果when无法解决,则默认为null
这是一个查询模式。 Test1和2总是在同一天完成,因此为什么我只对它们的日期进行一次旋转。测试3可能会在以后进行,可能相同,这意味着test12date和test3date中的日期可能相同,可能不同
进行数据透视后,将字符串转换为日期,以减少转换次数