我有下表,其中记录了驾驶员和车手的详细信息。对于每一天(datetime
),有一个司机和零个或多个车手。如果有多个骑手,则对于每个骑手,数据(骑车人姓名和骑手年龄)将在具有相同datetime
的新行中捕获。这可能不是构建数据的正确方法,但主要是由于每个日期时间每个驱动程序的车手数量不同
id datetime driver age riders rider_name | rider_age
---|------------|--------|------|--------|------------|---
1 | 03/03/2009 | joe | 24 | 0 | |
2 | 04/03/2009 | john | 39 | 1 | juliet | 30
3 | 05/03/2009 | borat | 32 | 2 | jane | 45
4 | 05/03/2009 | | | | mike | 18
5 | 06/03/2009 | john | 39 | 3 | duke | 42
6 | 06/03/2009 | | | | jose | 33
7 | 06/03/2009 | | | | kyle | 24
对于每个日期时间值,需要驾驶员,年龄,车手人数,最年轻骑手的姓名以及驾驶员在+/- 10年内的车手数量
datetime driver age riders youngest_rider riders_within_ten_years_of_driver
------------|--------|------|--------|--------------|---
03/03/2009 | joe | 24 | 0 | | 0 # no rider
04/03/2009 | john | 39 | 1 | juliet | 1 # juliet
05/03/2009 | borat | 32 | 2 | mike | 0 # no rider
06/03/2009 | john | 39 | 3 | kyle | 2 # duke, jose
答案 0 :(得分:2)
这是一个非常糟糕的数据结构,因为驱动程序名称为空,因此您没有用于聚合的密钥。更正规化的结构更好,但有时我们会遇到特定的格式。
您需要获取每行的驱动程序记录的ID。为此,请使用相关子查询:
select r.*,
(select max(r2.id)
from riders r2
where r2.id <= r.id and r2.driver is not null
) as driver_id
from riders r;
然后我们使用join
来构建它以获取驱动程序信息和条件聚合。对于除了最小年龄的司机之外的所有事情:
select datetime,
max(case when id = driver_id then driver end) as driver,
max(case when id = driver_id then age end) as age,
max(case when id = driver_id then riders end) as riders,
sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years
from (select r.*,
(select max(r2.id)
from riders r2
where r2.id <= r.id and r2.driver is not null
) as driver_id
from riders r
) r
group by datetime, driver_id;
具有最小年龄的骑手对于这种数据结构非常棘手。一种解决方案是使用CTE:
with r as (
select r.*,
(select max(r2.id)
from riders r2
where r2.id <= r.id and r2.driver is not null
) as driver_id
from riders r
)
select datetime,
max(case when id = driver_id then driver end) as driver,
max(case when id = driver_id then age end) as age,
max(case when id = driver_id then riders end) as riders,
sum(case when abs(rider_age - age) <= 10 then 1 else 0 end) as riders_within_10_years,
(select r2.rider_name
from r r2
where r2.driver_id = r.driver_id
order by r2.rider_age desc
limit 1
) as minimum_age_rider
from r
group by datetime, driver_id;
这比它需要的要困难得多,因为(1)数据结构不是很好,(2)SQLite不是特别强大(特别是它不支持窗口函数)。
答案 1 :(得分:0)
如果您提供数据插入,我可以尝试此查询是否有效。
select datetime, driver, age, max(riders)
,max(first_value(rider_name) over (partition by datetime, driver, age order by rider_age, rider_name)) youngest_rider
, count (case when rider_age between age -10 and age + 10
then 1
else 0
end
) count_riders_in_age_grp
from table
group by datetime, driver, age
答案 2 :(得分:0)
这是一个糟糕的数据库结构,但我认为它是一个家庭作业问题。无论如何,这应该有效:
SELECT [DateTime],
MAX(driver) AS [Driver],
MAX(AGE) AS [Age],
MAX(riders) AS [Riders],
t.rider_name AS [Youngest Rider],
ISNULL(SUM(CASE WHEN rider_age BETWEEN MAX(AGE)- 10 AND MAX(AGE) + 10 THEN 1 ELSE 0 END), 0) AS [Riders within Ten Years of Driver]
FROM my_table M
CROSS APPLY
(
SELECT rider_name
FROM my_table
WHERE DateTime = M.DateTime
AND rider_age = (SELECT MIN(rider_age) FROM my_table WHERE DateTime = M.DateTime)
) t
GROUP BY M.DateTime, t.rider_name
答案 3 :(得分:0)
SELECT
datetime
,max(driver) as driver
,max(age) as age
,max(riders) as riders
,first_value(rider_name) OVER
(PARTITION BY datetime
ORDER BY rider_age
rows unbounded preceding)
as youngest_rider
,count(b.id) as riders_within_ten_years_of_driver
FROM
my_table a
LEFT JOIN
my_table b
ON
a.datetime = b.datetime
AND a.age - b.rider_age between -10 AND 10
GROUP BY
datetime
,youngest_rider
这是一团糟。如果你有一张司机,骑手和游乐设施的桌子会更简单。