我有一个sql表,存储每日股票价格。市场收盘后每天都会插入新记录。我想找到价格连续上涨的股票。
该表有很多列,但这是相关的子集:
quoteid stockid closeprice createdate
--------------------------------------------------
1 1 1 01/01/2012
2 2 10 01/01/2012
3 3 15 01/01/2012
4 1 2 01/02/2012
5 2 11 01/02/2012
6 3 13 01/02/2012
7 1 5 01/03/2012
8 2 13 01/03/2012
9 3 17 01/03/2012
10 1 7 01/04/2012
11 2 14 01/04/2012
12 3 18 01/04/2012
13 1 9 01/05/2012
14 2 11 01/05/2012
15 3 10 01/05/2012
quoteid
列是主键。
在表格中,股票ID 1的收盘价每天都在增加。股票ID 3波动很大,股票ID 2的价格在最后一天下跌。
我正在寻找这样的结果:
stockid Consecutive Count (CC)
----------------------------------
1 5
2 4
如果你能得到连续连胜日期的输出,那就更好了:
stockid Consecutive Count (CC) StartDate EndDate
---------------------------------------------------------------
1 5 01/01/2012 01/05/2012
2 4 01/01/2012 01/04/2012
StartDate
是价格开始上涨的时候,EndDate
是牛市实际上完成的时间。
我认为这不是一个容易的问题。我在这里查看了其他帖子,这些帖子也处理了这个连续的场景,但它们不符合我的需求。如果你知道任何类似我的帖子,请告诉我。
答案 0 :(得分:7)
在任何情况下,根据每个库存增加行数来说它是有帮助的(实际的quoteid
值在这里并不真正有用)。 捕获的天数(在此表中)是最简单的 - 如果您想要其他东西(例如只有工作日,忽略周末/假期等等),它会更多地参与其中;你可能需要一个日历文件。如果您还没有索引,那么您希望索引超过[stockid
,createdate
]。
WITH StockRow AS (SELECT stockId, closePrice, createdDate,
ROW_NUMBER() OVER(PARTITION BY stockId
ORDER BY createdDate) rn
FROM Quote),
RunGroup AS (SELECT Base.stockId, Base.createdDate,
MAX(Restart.rn) OVER(PARTITION BY Base.stockId
ORDER BY Base.createdDate) groupingId
FROM StockRow Base
LEFT JOIN StockRow Restart
ON Restart.stockId = Base.stockId
AND Restart.rn = Base.rn - 1
AND Restart.closePrice > Base.closePrice)
SELECT stockId,
COUNT(*) AS consecutiveCount,
MIN(createdDate) AS startDate, MAX(createdDate) AS endDate
FROM RunGroup
GROUP BY stockId, groupingId
HAVING COUNT(*) >= 3
ORDER BY stockId, startDate
从提供的数据中得到以下结果:
Increasing_Run
stockId consecutiveCount startDate endDate
===================================================
1 5 2012-01-01 2012-01-05
2 4 2012-01-01 2012-01-04
3 3 2012-01-02 2012-01-04
SQL Fiddle Example
(小提琴也有一个多次运行的例子)
此分析将忽略所有差距,正确匹配所有运行(下次正运行开始时)。
那么这里发生了什么?
StockRow AS (SELECT stockId, closePrice, createdDate,
ROW_NUMBER() OVER(PARTITION BY stockId
ORDER BY createdDate) rn
FROM Quote)
这个CTE用于一个目的:我们需要一种方法来查找下一行/上一行,所以首先我们按顺序(日期)对每一行进行编号...
RunGroup AS (SELECT Base.stockId, Base.createdDate,
MAX(Restart.rn) OVER(PARTITION BY Base.stockId
ORDER BY Base.createdDate) groupingId
FROM StockRow Base
LEFT JOIN StockRow Restart
ON Restart.stockId = Base.stockId
AND Restart.rn = Base.rn - 1
AND Restart.closePrice > Base.closePrice)
...然后根据索引加入它们。如果您最终得到LAG()
/ LEAD()
的内容,那么使用这些内容几乎肯定会是更好的选择。这里有一个关键的事情 - 只有当行无序(小于前一行)时才匹配。否则,该值最终为null
(使用LAG()
,之后您需要使用类似CASE
之类的内容来关闭它。你得到一个看起来像这样的临时集:
B.rn B.closePrice B.createdDate R.rn R.closePrice R.createdDate groupingId
1 15 2012-01-01 - - - -
2 13 2012-01-02 1 15 2012-01-01 1
3 17 2012-01-03 - - - 1
4 18 2012-01-04 - - - 1
5 10 2012-01-05 4 18 2012-01-04 4
...只有当前一个大于“当前”行时才有Restart
的值。在窗函数中使用MAX()
被用于到目前为止看到的最大值...因为null
最低,导致所有其他行保留行索引,直到另一个不匹配发生(给出一个新值)。此时,我们基本上拥有gaps-and-islands查询的中间结果,为最终聚合做好准备。
SELECT stockId,
COUNT(*) AS consecutiveCount,
MIN(createdDate) AS startDate, MAX(createdDate) AS endDate
FROM RunGroup
GROUP BY stockId, groupingId
HAVING COUNT(*) >= 3
ORDER BY stockId, startDate
查询的最后一部分是获取运行的开始和结束日期,并计算这些日期之间的条目数。如果日期计算有更复杂的事情,那么可能需要在此时进行。 GROUP BY
显示了不的少数合法实例之一,其中包括SELECT
子句中的列。 HAVING
子句用于消除“太短”的运行。
答案 1 :(得分:1)
我会尝试CTE,大致如下:
with increase (stockid, startdate, enddate, cc) as
(
select d2.stockid, d1.createdate as startdate, d2.createdate as enddate, 1
from quote d1, quote d2
where d1.stockid = d2.stockid
and d2.closedprice > d1.closedprice
and dateadd(day, 1, d1.createdate) = d2.createdate
union all
select d2.stockid, d1.createdate as startdate, cend.enddate as enddate, cend.cc + 1
from quote d1, quote d2, increase cend
where d1.stockid = d2.stockid and d2.stockid = cend.stockid
and d2.closedprice > d1.closedprice
and d2.createdate = cend.startdate
and dateadd(day, 1, d1.createdate) = d2.createdate
)
select o.stockid, o.cc, o.startdate, o.enddate
from increase o where cc = (select max(cc) from increase i where i.stockid = o.stockid and i.enddate = o.enddate)
这假设没有差距。标准dateadd(day, 1, d1.createdate) = d2.createdate
必须由其他东西替换,以指示d2是否是d1之后的“下一个”日。
答案 2 :(得分:0)
根据我的需要,这是最终的SQL。测试显示它工作正常。我正在使用来自@Oran的CC方法
WITH StockRow (stockId, [close], createdDate, rowNum)
as
(
SELECT stockId, [close], createdDate,
ROW_NUMBER() OVER(PARTITION BY stockId ORDER BY createdDate)
FROM dbo.Quote
where createddate >= '01/01/2012' --Beginning of this year
),
RunStart (stockId, [close], createdDate, runId) as (
SELECT a.stockId, a.[close], a.createdDate,
ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate)
FROM StockRow as a
LEFT JOIN StockRow as b
ON b.stockId = a.stockId
AND b.rowNum = a.rowNum - 1
AND b.[close] < a.[close]
WHERE b.stockId IS NULL)
,
RunEnd (stockId, [close], createdDate, runId) as (
SELECT a.stockId, a.[close], a.createdDate,
ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate)
FROM StockRow as a
LEFT JOIN StockRow as b
ON b.stockId = a.stockId
AND b.rowNum = a.rowNum + 1
AND b.[close] > a.[close]
WHERE b.stockId IS NULL)
SELECT a.stockId, s.companyname, s.Symbol,
a.createdDate as startdate, b.createdDate as enddate,
(select count(r.createdDate) from dbo.quote r where r.stockid = b.stockid and r.createdDate between a.createdDate and b.createdDate) as BullRunDuration
FROM RunStart as a JOIN RunEnd as b
ON b.stockId = a.stockId
join dbo.stock as s
on a.stockid = s.stockid
AND b.runId = a.runId
AND b.[close] > a.[close]
and (select count(r.createdDate) from dbo.quote r where r.stockid = b.stockid and
r.createdDate between a.createdDate and b.createdDate) > 2 -- trying to avoid cluter
order by 6 desc, a.stockid