我有以下数据表
Animal Immunization_Date
Cat 1/18/2017
Cat 1/27/2017
Cat 5/7/2017
Cat 5/12/2017
Dog 1/1/2017
Dog 1/5/2017
Dog 1/7/2017
Dog 3/25/2017
Dog 4/18/2017
我正在尝试根据动物的10天间隔创建排名,这将产生以下结果。 (查找动物的第一个日期,然后在该日期的10天内的任何日期分配一组1
。然后为未分配给1
的动物选择下一个日期并指定它2
然后将2
分配给该日期后10天内的任何日期等...)
Animal Immunization_Date 10_Day_Group_Rank
Cat 1/18/2017 1
Cat 1/27/2017 1
Cat 5/7/2017 2
Cat 5/12/2017 2
Dog 1/1/2017 1
Dog 1/5/2017 1
Dog 1/7/2017 1
Dog 3/25/2017 2
Dog 4/18/2017 3
我一直在尝试以下代码,但我似乎无法让10天小组工作。
Select
dt.Animal,
dt.Immunization_Date,
sum(dt.10_day_Group) over(partition dt.Animal order by dt.Immunization_Date rows unbounded preceding) as 10_day_Group --creates a running total that is also the group
from
(
Select
Animal,
Immunization_Date,
case when min(Immunization_Date) over (partition by Animal order by Immunization_Date) <=10 then 1 else 0 end as 10_Day_Group --Create intervals of 10 days
from Table_A
) as dt
我不确定如何组织10天的分组。
case when min(Immunization_Date) over (partition by Animal order by Immunization_Date) <=10 then 1 else 0 end as 10_Day_Group
我可以在Excel中执行以下操作。我知道excel和SQL是不同的,但我希望如果能在Excel中完成某些事情,可以在Excel中完成它。
Excel数据表如下所示(表格从单元格A1
开始)。 (注意Animal
需要排序,并且需要对Immunization_Date
进行排序才能使Excel公式正常工作)
Animal Immunization_Date Dummy_1 10_Day_Group
Cat 1/18/2017 1/18/2017 1
Cat 1/27/2017 1/18/2017 1
Cat 5/7/2017 5/7/2017 2
Cat 5/12/2017 5/7/2017 2
Dog 1/1/2017 1/1/2017 1
Dog 1/5/2017 1/1/2017 1
Dog 1/7/2017 1/1/2017 1
Dog 3/25/2017 3/25/2017 2
Dog 4/18/2017 4/18/2017 3
Dummy_1
的公式如下
IFERROR(IF(AND(A2=A1,B2-C1<=10),C1,B2),B2)
10_Day_Group
的公式如下
IFERROR(IF(AND(C2=C1,A2=A1),D1,IF(AND(A2=A1,C2<>C1),D1+1,1)),1)
答案 0 :(得分:1)
近似......
SELECT
animal,
immunization_date,
DENSE_RANK() OVER (PARTITION BY animal
ORDER BY base_date,
CAST(immunization_date - base_date AS INT) / 10
)
AS group_id
FROM
(
SELECT
animal,
immunization_date,
MAX(
CASE WHEN immunization_date < lagged_immunization_date + 10
THEN NULL
ELSE immunization_date
END
)
OVER (PARTITION BY animal
ORDER BY immunization_date
ROWS UNBOUNDED PRECEDING
)
AS base_date
FROM
(
SELECT
animal,
immunization_date,
LAG(immunization_date) OVER (PARTITION BY animal
ORDER BY immunization_date
)
AS lagged_immunization_date
FROM
yourData
)
lagged_dates
)
base_dated
SQLFiddle没有TeraData,但上述代码应该在TeraData 和 SQL Server中运行... http://sqlfiddle.com/#!18/68260/1
答案 1 :(得分:1)
@MatBailie的递归答案相当不错,但是当每只动物的行数增加时性能会变差。
当第一个CTE可以在易失性表中实现时,它将降低资源使用量(因为Teradata的优化器没有实现这个结果,该死的):
CREATE VOLATILE TABLE boundaries AS
(
SELECT
i.*, -- need to add the alias
(
SELECT MIN(immunization_date)
FROM immunizations
WHERE animal = i.animal
AND immunization_date >= i.immunization_date + 10
)
AS next_boundary_date
FROM
immunizations i
)
WITH DATA
UNIQUE PRIMARY INDEX(animal, immunization_date)
ON COMMIT PRESERVE ROWS;
但是当你可以使用临时表时,你也可以使用简单的递归:
CREATE VOLATILE TABLE vt AS
(
SELECT
animal,
immunization_date,
Row_Number() -- add row number to simplify recursive processing
Over (PARTITION BY animal
ORDER BY immunization_date) AS rn
FROM immunizations AS i
)
WITH DATA
UNIQUE PRIMARY INDEX(animal, rn)
ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte AS
(
SELECT
animal, immunization_date, rn,
immunization_date+10 AS end_date, -- define the end of the range
1 AS grp -- SMALLINT = limited to 127 group, CAST to a larger INT for more groups
FROM vt
WHERE rn = 1 -- oldest row
UNION ALL
SELECT
vt.animal, vt.immunization_date, vt.rn,
-- check if the current row's date is within the 10 day range
-- otherwise increase the group number and define the new range end
CASE WHEN vt.immunization_date < end_date THEN cte.end_date ELSE vt.immunization_date +10 END,
CASE WHEN vt.immunization_date < end_date THEN cte.grp ELSE cte.grp+1 END
FROM cte
JOIN vt
ON vt.animal = cte.animal
AND vt.rn = cte.rn+1
)
SELECT *
FROM cte
ORDER BY 1,2
答案 2 :(得分:0)
我们可以利用Teradata的PERIOD数据类型及其相关功能来帮助解决这个问题,而不会太复杂。
这很接近。不完全,但很接近:
WITH ta_period AS
(
SELECT
PERIOD(immunization_date - INTERVAL '10' DAY, immunization_date) AS periodbucket,
ROW_NUMBER() OVER (PARTITION BY animal ORDER BY immunization_date) AS animal_row,
table_a.animal,
table_a.immunization_date
FROM table_a
)
,cal_buckets AS
(
SELECT calendar_dateFROM Sys_Calendar."CALENDAR" cal
WHERE calendar_date >= (SELECT MIN(immunization_date) FROM table_a)
AND calendar_date <= (SELECT MAX(immunization_date) FROM table_a)
)
SELECT
TA.animal,
TA.immunization_date,
cal_buckets.bucket,
DENSE_RANK() OVER (PARTITION BY ta.animal ORDER BY ta_normal.periodbucket, cal_buckets.bucket ) AS ten_day_bucket
FROM
(
SELECT NORMALIZE
ta_period.animal,
ta_period.periodbucket P_INTERSECT ta_period.periodbucket AS periodbucket
FROM ta_period LEFT OUTER JOIN ta_period ta_period2
ON ta_period.periodbucket CONTAINS ta_period2.immunization_date
AND ta_period.animal = ta_period2.animal
AND ta_period.animal_row <> ta_period2.animal_row
) ta_normal
INNER JOIN ta_period ta ON
ta_normal.animal = ta.animal
AND ta_normal.periodbucket P_INTERSECT ta.periodbucket IS NOT NULL
INNER JOIN cal_buckets ON
ta.immunization_date = cal_buckets.calendar_date;