根据同一子集内的值从子集中进行选择?

时间:2011-07-26 20:38:53

标签: sql sql-server sql-server-2008 select

我创建了一个这样的表:

CREATE TABLE #TEMP(RecordDate datetime, First VARCHAR(255), Last VARCHAR(255), Value int)

INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','smith','10')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','adams','60')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','resig','90')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','balte','95')

INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','smith','98')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','adams','67')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','resig','24')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','balte','20')

SELECT * FROM #TEMP

DROP TABLE #TEMP

现在包含以下记录:

RecordDate              First   Last    Value
2011-03-01 00:00:00.000 john    smith   10
2011-03-01 00:00:00.000 john    adams   60
2011-03-01 00:00:00.000 john    resig   90
2011-03-01 00:00:00.000 john    balte   95
2011-03-01 01:00:00.000 john    smith   98
2011-03-01 01:00:00.000 john    adams   67
2011-03-01 01:00:00.000 john    resig   24
2011-03-01 01:00:00.000 john    balte   20

我正在尝试获取如下表格:

RecordDate                first    Good     Bad
2011-03-01 00:00:00.000   john     3        1
2011-03-01 01:00:00.000   john     2        2

我计算好与坏的方法是在特定日期采用名字为MAX的所有人john,然后将其作为特定日期的原始数据集的过滤器应用日期和名字。只有大于0.5*MAXValue的值才会被视为Good

在结果表中,有3个不错的值,因为第一个日期的最大值为95且只有60,90,95大于0.5*95,因此结果为{{1} }。在第二个结果中,同样地,它是(Good,Bad) = (3,1)

我的桌子足够大,有近3亿条记录,我无法理解从哪里开始有效地做到这一点。关于什么是有效方式的建议?

我目前的(工作但昂贵的)方法如下:

(2,2)

2 个答案:

答案 0 :(得分:3)

你走了:

select 
   t.RecordDate,
   COUNT(case 
           when t.Value > MV.MaxValue * 0.5 then 1
           else null
         end) Good,
   COUNT(case 
           when t.Value <= MV.MaxValue * 0.5 then 1
           else null
         end) Bad
from #Temp t inner join
(select RecordDate, MAX(Value) MaxValue
 from #Temp Group By RecordDate) MV on t.RecordDate = MV.RecordDate
Group by t.RecordDate

诀窍是创建一个派生表,其中包含每个记录日期的最大值,然后使用表本身INNER JOIN创建一个派生表。一旦解决了最大值,就可以直接访问它们。

<强>更新

我看到你更新了你的问题,并在结果中包含了第一个名字。永远不要害怕,这是解决方案:

select 
   t.RecordDate,
   t.First,
   COUNT(case 
           when t.Value > MV.MaxValue * 0.5 then 1
           else null
         end) Good,
   COUNT(case 
           when t.Value <= MV.MaxValue * 0.5 then 1
           else null
         end) Bad
from #Temp t inner join
(select RecordDate, First, MAX(Value) MaxValue
 from #Temp Group By RecordDate, First) MV 
   on (t.RecordDate = MV.RecordDate and t.First = MV.First)
Group by t.RecordDate, t.First

答案 1 :(得分:1)

引用外部查询的嵌套查询可能会导致大量重复性工作。 这将只计算所有名称和日期的所有MAX:

SELECT RecordDate, FirstName, MAX(Value) FROM #TEMP GROUP BY RecordDate, FirstName  

现在加入回原始数据:

SELECT A.RecordDate, A.FirstName,
       SUM(CASE WHEN Value > MaxVal*0.5 THEN 1 ELSE 0 END) AS GOOD,
       SUM(CASE WHEN Value > MaxVal*0.5 THEN 0 ELSE 1 END) AS BAD,
FROM #TEMP A INNER JOIN
     (SELECT RecordDate, FirstName, MAX(Value) as MaxVal 
      FROM #TEMP GROUP BY RecordDate, FirstName) B 
         ON (A.RecordDate = B.RecordDate AND A.FirstName = B.FirstName)
GROUP BY A.RecordDate, A.FirstName