我有一个包含4列的SQL表:
id
- varchar(50)g1
- varchar(50)g2
- varchar(50)datetime
- 时间戳我有这个问题:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
此查询旨在回答:用户中g1
值的分布是什么?
每个id
可能在表格中包含多个记录。我想考虑最早的一个。早期意味着最小datetime
值。
id g1 g2 datetime
x1 a start 2016-01-19 21:01:22
x1 c start 2016-01-19 21:01:21
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
g1 count total share
a 2 4 0.5
b 1 4 0.25
c 1 4 0.25
我们有4条记录,但我只想考虑两条记录:
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
这是每id
个最早的记录。
g1 count total share
a 1 2 0.5
b 1 2 0.5
如何仅考虑id
group by
的最早记录
答案 0 :(得分:2)
我不知道你的DBMS是什么,所以这里采用标准的ANSI方式
SELECT T1.g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table T1
INNER JOIN
(SELECT id, MIN(datetime) AS mindt
FROM mytable
GROUP BY id
) T2 ON T1.datetime=t2.mindt AND T1.id=T2.id
and T1.g2 = 'start'
GROUP BY 1
order by share desc
如果您有一个大表并且datetime
未编入索引,则可能会很慢。
答案 1 :(得分:2)
这是一个应该在SQL Server中工作的解决方案,以及任何支持CTE的数据库:
WITH cte AS
(
SELECT t1.g1,
COUNT(*) AS count
FROM yourTable t1
INNER JOIN
(
SELECT id, MIN(datetime) AS datetime
FROM yourTable
GROUP BY id
) t2
ON t1.id = t2.id AND
t1.datetime = t2.datetime
)
SELECT t.g1,
t.count,
(SELECT COUNT(*) FROM cte) AS total,
t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
答案 2 :(得分:2)
尝试使用以下查询。
;WITH cte_1
as (SELECT id, MIN(datetime) AS [Date]
FROM YourTable
GROUP BY id
)
SELECT yt.g1,
COUNT(DISTINCT yt.id) [Count],
SUM(COUNT(DISTINCT yt.id)) OVER () AS total,
(CAST(COUNT(DISTINCT yt.id) AS float) / SUM(COUNT(DISTINCT yt.id)) OVER ()) AS share
FROM cte_1 c
JOIN YourTable yt
ON yt.[datetime]=c.[Date] AND yt.id=c.id
and yt.g2 = 'start'
GROUP BY yt.g1
ORDER BY share DESC
输出:
答案 3 :(得分:1)
您正在查询my_table
所有数据,尽管您只想获得id
的最早日期。我假设id
是表中的主键。
我建议您定义一个视图(或内嵌视图),该视图仅查询id
的最早日期,并在该视图上使用您的查询,而不是 my_table 。< / p>
视图可以这样定义,并且只包含id
的最早日期:
select * from my_table a
where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start'
您可以将其定义为视图或直接使用它,如下所示:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM (select a.id, a.g1, a.g2, a.datetime from my_table a where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start')
GROUP BY 1
order by share desc