选择每个id的前x个

时间:2015-02-04 04:42:30

标签: sql sql-server

我对SQL有点生疏。

假设我tblMachineLogs MachineLogIDMachineIDLogTime (date+time)

此表填充了来自10台MachineID 1到10台机器的日志,并且其中有很多行。

我想选择例如最后5个日志事件但是每台机器。

提前致谢

5 个答案:

答案 0 :(得分:6)

使用Window Function可以帮助您查找每个组5

中的上一个log events (MachineID)
SELECT MachineLogID,
        MachineID,
        LogTime
FROM   (SELECT Row_number()OVER(partition BY MachineID ORDER BY LogTime DESC) Rn,
                MachineLogID,
                MachineID,
                LogTime
        FROM   tblMachineLogs) a
WHERE  rn <= 5 

答案 1 :(得分:2)

SQL Server的解决方案。我在SQL Server 2008上测试过它。

想象一下,MachineLogs有数百万或数十亿行,并且(MachineID, LogTime DESC)上有索引。使用ROW_NUMBER的解决方案将扫描整个表(或仅扫描索引,但它将是完整扫描)。如果索引在(MachineID, LogTime ASC)上,它也会进行额外的昂贵排序。

另一方面,如果我们有一个包含10行的小表Machines,每个MachineID一个,那么就可以编写一个查询,在索引上搜索10而不是扫描整个大桌子。

我将创建一个包含100万行的大表MachineLogs和包含10行的小表Machines并测试两个解决方案。

Machines将有10行:

CREATE TABLE [dbo].[Machines](
    [ID] [int] NOT NULL,
CONSTRAINT [PK_Machines] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

INSERT INTO [dbo].[Machines]
([ID])
VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
;

索引在([MachineID] ASC, [LogTime] DESC)上的大表:

CREATE TABLE [dbo].[MachineLogs](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [MachineID] [int] NOT NULL,
    [LogTime] [datetime] NOT NULL,
 CONSTRAINT [PK_MachineLogs] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_MachineID_LogTime] ON [dbo].[MachineLogs]
(
    [MachineID] ASC,
    [LogTime] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

ALTER TABLE [dbo].[MachineLogs]  WITH CHECK ADD  CONSTRAINT [FK_MachineLogs_Machines] FOREIGN KEY([MachineID])
REFERENCES [dbo].[Machines] ([ID])
GO

ALTER TABLE [dbo].[MachineLogs] CHECK CONSTRAINT [FK_MachineLogs_Machines]
GO

生成1M行:

WITH
CTE_Times
AS
(
    -- generate 100,000 rows with random datetimes between 2001-01-01 and ~2004-03-01 (100,000,000 seconds)
    SELECT TOP(100000)
        DATEADD(second, 100000000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5), '20010101') AS LogTime
    FROM
        sys.all_objects AS X1
        CROSS JOIN sys.all_objects AS X2
)
-- generate 1M rows
INSERT INTO dbo.MachineLogs
    (MachineID
    ,LogTime)
SELECT
    dbo.Machines.ID
    ,CTE_Times.LogTime
FROM
    dbo.Machines
    CROSS JOIN CTE_Times
;

使用ROW_NUMBER

的解决方案
WITH
CTE_rn
AS
(
    SELECT
        ROW_NUMBER() OVER (PARTITION BY MachineID ORDER BY LogTime DESC) AS rn
        ,ID
        ,MachineID
        ,LogTime
    FROM MachineLogs
)
SELECT
    ID
    ,MachineID
    ,LogTime
FROM CTE_rn
WHERE rn <= 5
;

CROSS APPLY解决方案

SELECT
    CA.ID
    ,CA.MachineID
    ,CA.LogTime
FROM
    Machines
    CROSS APPLY
    (
        SELECT TOP(5)
            MachineLogs.ID
            ,MachineLogs.MachineID
            ,MachineLogs.LogTime
        FROM MachineLogs
        WHERE
            MachineLogs.MachineID = Machines.ID
        ORDER BY LogTime DESC
    ) AS CA
;

执行计划

plans

您可以看到ROW_NUMBER的解决方案进行了索引扫描,而CROSS APPLY解决方案会进行索引搜索。

IO统计

SET STATISTICS IO ON;

ROW_NUMBER的解决方案:

(50 row(s) affected)
Table 'MachineLogs'. Scan count 1, logical reads 2365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CROSS APPLY的解决方案:

(50 row(s) affected)
Table 'MachineLogs'. Scan count 10, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Machines'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

答案 2 :(得分:1)

为每台选择前5行的机器创建一个查询,将其全部联合起来,然后按日志时间按降序排序(以获取最后5行)。 以下是两台机器的示例,只需填写缺失的8台。

--drop table #tmp
SELECT  *
into #tmp
FROM    
(
select 1 as MachineLogID, 1 as MachineID , GETDATE() - 0.1 LogTime
    UNION
select 2 as MachineLogID, 1 as MachineID , GETDATE()- 0.2 LogTime
    UNION
select 3 as MachineLogID, 1 as MachineID , GETDATE()- 0.3 LogTime
    UNION
select 4 as MachineLogID, 1 as MachineID , GETDATE()- 0.4 LogTime
    UNION
select 5 as MachineLogID, 1 as MachineID , GETDATE()- 0.5 LogTime
    UNION
select 6 as MachineLogID, 1 as MachineID , GETDATE() - 0.6 LogTime
    UNION
select 7 as MachineLogID, 2 as MachineID , GETDATE()- 0.7 LogTime
    UNION
select 8 as MachineLogID, 2 as MachineID , GETDATE() - 0.8 LogTime
    UNION
select 9 as MachineLogID, 2 as MachineID , GETDATE() - 0.9 LogTime
    UNION
select 10 as MachineLogID, 2 as MachineID , GETDATE() - 0.10 LogTime
    UNION
select 11 as MachineLogID, 2 as MachineID , GETDATE() - 0.11 LogTime
    UNION
select 12 as MachineLogID, 2 as MachineID , GETDATE() - 0.12 LogTime
) a

SELECT  *
FROM    
(
    SELECT  top 5 *
    FROM  #tmp a
    where machineId = 1
    order by LogTime desc
        union
    SELECT  top 5 *
    FROM  #tmp a
    where machineId = 2
    order by LogTime desc
) a
order by a.machineId , a.LogTime desc

答案 3 :(得分:0)

为了简单起见,我会在每台机器上进行单独查询。

如果您使用的是MySQL:

SELECT MachineLogID, MachineID, LogTime FROM tblMachineLogs WHERE MachineID='str_machineid' ORDER BY LogTime DESC LIMIT 5;

这将从str_machineid指示的ID的机器返回最后5个事件日志项。如果机器ID是数字字段(并且它应该),则删除引号。

答案 4 :(得分:0)

Select top 5 * from yourTable where machineId =1
Union all
Select top 5 * from yourtable where machineid =2
Union all
.
.
.
.
Select top 5 * from yoyrtable 
Where machineid=10