一种在SQL Server中选择组中第一个值的方法?

时间:2013-12-14 07:03:41

标签: sql sql-server

我正在尝试查找组中的第一个值和最后一个值。 像First([Open]),Max([High]),Min([Low]),Last([Close])

下面是其中一个查询(目前缺少打开/关闭列的逻辑。数据集非常大(超过1.5亿条记录),因此查询性能可能会成为一个问题。

Select 'AUDCHF' AS CURRENCY,
    Datepart(year, Datekey) AS [YEAR],  
    Datepart(month, Datekey) AS [MONTH], 
    Datepart(day, Datekey) AS [DAY], 
    Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
    Case 
        When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
        Else '6th 4 Hours'
    End AS [4 Hours], 
    Datepart(hour, Datekey) AS [HOUR], 
    max(High) AS HIGH, 
    min(Low) AS LOW
From AUDCHF
    Group by Datepart(year, Datekey), Datepart(month, Datekey), Datepart(day, Datekey), 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End,
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End, 
        Datepart(hour, Datekey)
    Order by Datepart(year, Datekey),  Datepart(month, Datekey), Datepart(day, Datekey), 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End,
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End, 
        Datepart(hour, Datekey)

2 个答案:

答案 0 :(得分:1)

ORDER BY可以使用SELECT表达式列表中定义的别名,因为它是 在SELECT部分​​之后进行评估(这不是GROUP BY部分的情况)。

在您的查询中,order by子句可以是:

Order by [YEAR],  [MONTH], [DAY], [4 Hours],[HOUR]

由于您按年/月/日/ 4小时/ 4小时进行分组,我认为您可以删除4小时部分。

我会使用窗口函数并使用GROUP BY执行外部选择以删除重复项。

select [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours],
    max([HIGH]) as HIGH, min([LOW]) as LOW,
    max([Open]) as [Open], max([Close]) as [Close]
from (
    select 
        Datepart(year, Datekey) AS [YEAR],  
        Datepart(month, Datekey) AS [MONTH], 
        Datepart(day, Datekey) AS [DAY], 
        Datepart(hour, Datekey) AS [HOUR], 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End AS [4 Hours], 
        max(High) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey) , 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [HIGH], 
        min(Low) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [LOW],
        first_value([Open]) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            order by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [Open],
        last_value([Close]) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            order by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [Close]


    from AUDCHF ) T
group by [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours]

外部最大(高),最小(低)等在这里只是为了让GROUP BY感到满意,因为它们已经在内部选择中被处理,所以它们在这里并不是真正有意义的(我不知道Open和Close是什么所以我只使用相同的分区放置第一个和最后一个值。)

如果此查询必须在大表上运行,并且因为没有要减少的WHERE子句 选中的行,我会在Datekey上创建一个索引,包括High和Low列(以及查询中不包含的其他列:Close等),以避免完整的表扫描。它将导致完整的索引扫描,这可能会快得多:

create  nonclustered index IxAudchf on AUDCHF(Datekey) include( [High], [Low], [Open], [Close]) ;

对于Sql Window功能,您可以找到演示文稿herehere

注意:FIRST_VALUE和LAST_VALUE仅为Sql2012,而非2008年。

如果您运行的是SQL 2005或2008,则以下内容应该相同(可能效率较低)。我在最后一行接受了Low和Close,我不确定它是你想要的,如果我误解了,就改变它以遵循你的逻辑。

; WITH 
WAUDCHF1 as
(   select 
        row_number() over( 
            partition by
                Datepart(year, Datekey), Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            order by Datepart(year, Datekey) , Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [Rownum], 
        Datepart(year, Datekey) AS [YEAR],  
        Datepart(month, Datekey) AS [MONTH], 
        Datepart(day, Datekey) AS [DAY], 
        Datepart(hour, Datekey) AS [HOUR], 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End AS [4 Hours], 
        max(High) over( 
            partition by
                Datepart(year, Datekey) , Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [HIGH], 
        min(Low) over( 
            partition by
                Datepart(year, Datekey) ,  Datepart(month, Datekey), 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [LOW],
        [Open],
        [Close]
    from AUDCHF ),
LASTROWNUM as (
    select [YEAR], [MONTH], [DAY], [HOUR], max(rownum) as [Rownum]
    from WAUDCHF1
    group by [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours]
     )
select W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], 
    max(W1.[High]) as [High], min(W2.[Low]) as [Low], 
    max(W1.[Open]) as [Open], max(w2.[Close]) as [Close]
from LASTROWNUM M
inner join WAUDCHF1 W1 on M.[YEAR] = W1.[YEAR]
            and  M.[MONTH]= W1.[MONTH]
            and  M.[DAY] =  W1.[DAY]  
            and  M.[HOUR] = W1.[HOUR]           
inner join WAUDCHF1 W2 on W2.[YEAR] = M.[YEAR]
            and  W2.[MONTH]= M.[MONTH]
            and  W2.[DAY] =  M.[DAY]  
            and  W2.[HOUR] = M.[HOUR] 

            and  W2.Rownum = M.Rownum
Where W1.Rownum = 1 
group by W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], w1.[12 Hour], W1.[4 Hours]
order by W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], w1.[12 Hour], W1.[4 Hours]

答案 1 :(得分:0)

查询:

SELECT 'AUDCHF' AS CURRENCY,
    Datepart(year, Datekey) AS [YEAR], Datepart(month, Datekey) AS [MONTH], 
    Datepart(day, Datekey) AS [DAY], [12 Hour], [4 Hours],
    Datepart(hour, Datekey) AS [HOUR], High AS HIGH, Low AS LOW,
    (SELECT High FROM Rate AS R WHERE R.Datekey = (SELECT MIN(Datekey) 
            FROM Rate WHERE DATEADD(hour, DATEDIFF(hour, 0, Rate.Datekey), 0) =
                AUDCHF.Datekey AND Rate.Base = 'AUD' AND Rate.Target = 'CHF') 
            AND R.Base = 'AUD' AND R.Target = 'CHF') AS [Open],
    (SELECT Low FROM Rate AS R WHERE R.Datekey = (SELECT MAX(Datekey) 
            FROM Rate WHERE DATEADD(hour, DATEDIFF(hour, 0, Rate.Datekey), 0) =
                AUDCHF.Datekey AND Rate.Base = 'AUD' AND Rate.Target = 'CHF') 
            AND R.Base = 'AUD' AND R.Target = 'CHF') AS [Close]
FROM AUDCHF, Segment
    WHERE Segment.Hour = Datekey
    ORDER BY Datepart(year, Datekey),  Datepart(month, Datekey), 
        Datepart(day, Datekey), Datepart(hour, Datekey);

将返回您期望的结果。我还将案例陈述提取到支持表中,您可以在SQLFiddle看到。提取还提供了对某些测试数据的查询结果。这使用T-SQL datetime rounded to nearest minute and nearest hours with using functions中的答案将时间截断为几小时。

基本上,视图AUDCHF会转换截断Datekey并执行分组。然后,查询将其与Segment表连接以提取常量字符串并计算初始值和最终值。这些需要在子查询中,因为它们与聚合无关。

当然,您需要在表上包含索引以保持性能。如果您不在主表中保留其他数据,或创建自定义索引,则应缓存大部分数据。

由于数据都是历史数据,您还可以准备物化视图以便快速参考。

货币对处理是部分的,在顶级视图中可以更好地处理,以避免重复的常量。它显示了如何将费率合并到一个表中以简化添加新的费率对。