我在Netezza有一张看起来像这样的表
Date Stock Return
2015-01-01 A xxx
2015-01-02 A xxx
2015-01-03 A 0
2015-01-04 A 0
2015-01-05 A xxx
2015-01-06 A xxx
2015-01-07 A xxx
2015-01-08 A xxx
2015-01-09 A xxx
2015-01-10 A 0
2015-01-11 A 0
2015-01-12 A xxx
2015-01-13 A xxx
2015-01-14 A xxx
2015-01-15 A xxx
2015-01-16 A xxx
2015-01-17 A 0
2015-01-18 A 0
2015-01-19 A xxx
2015-01-20 A xxx
数据代表各种股票和日期的股票回报。我需要做的是按给定的间隔和该间隔的日期对数据进行分组。另一个困难是周末(0)必须打折(忽略公共假期)。并且第一个间隔的开始日期应该是任意日期。
例如,我的出局应该看起来像这样
Interval Q01 Q02 Q03 Q04 Q05
1 xxx xxx xxx xxx xxx
2 xxx xxx xxx xxx xxx
3 xxx xxx xxx xxx xxx
4 xxx xxx xxx xxx xxx
此输出表示长度为5个工作日的间隔,平均回报为结果,以上述原始数据表示, 开始日期1月1日,第1个间隔包括1/2/5/6/7(3个和4个周末被忽略)Q01将是第1个,Q02是第2个,Q03是第5个等等。第二个间隔从8/9开始/ 12/13/14
我尝试失败的是使用
CEIL(CAST(EXTRACT(DOY FROM DATE) AS FLOAT) / CAST (10 AS FLOAT)) AS interval
EXTRACT(DAY FROM DATE) % 10 AS DAYinInterval
我也尝试过使用滚动计数器和可变的开始日期,将我的DOY设置为零,这样的s.th
CEIL(CAST(EXTRACT(DOY FROM DATE) - EXTRACT(DOY FROM 'start-date' AS FLOAT) / CAST (10 AS FLOAT)) AS Interval
最接近我期望的一件事就是这个 SUM(Number)OVER(按日期排序按日期排序ASC第10行)AS计数器
不幸的是,它从1到10,然后是11s,它应该从1再到10。
我很想知道如何以优雅的方式实现这一点。感谢
答案 0 :(得分:1)
我不完全确定我理解这个问题,但我 想 我可能会这样做,所以我会在这个问题上采取行动一些窗口化的聚合和子查询。
这里是样本数据,在工作日插入一些随机的非零数据。
DATE | STOCK | RETURN
------------+-------+--------
2015-01-01 | A | 16
2015-01-02 | A | 80
2015-01-03 | A | 0
2015-01-04 | A | 0
2015-01-05 | A | 60
2015-01-06 | A | 25
2015-01-07 | A | 12
2015-01-08 | A | 1
2015-01-09 | A | 81
2015-01-10 | A | 0
2015-01-11 | A | 0
2015-01-12 | A | 35
2015-01-13 | A | 20
2015-01-14 | A | 69
2015-01-15 | A | 72
2015-01-16 | A | 89
2015-01-17 | A | 0
2015-01-18 | A | 0
2015-01-19 | A | 100
2015-01-20 | A | 67
(20 rows)
这是我对它的看法,带有嵌入式评论。
select avg(return),
date_period,
day_period
from (
-- use row_number to generate a sequential value for each DOW,
-- with a WHERE to filter out the weekends
select date,
stock,
return,
date_period ,
row_number() over (partition by date_period order by date asc) day_period
from (
-- bin out the entries by date_period using the first_value of the entire set as the starting point
-- modulo 7
select date,
stock,
return,
date + (first_value(date) over (order by date asc) - date) % 7 date_period
from stocks
where date >= '2015-01-01'
-- setting the starting period date here
)
foo
where extract (dow from date) not in (1,7)
)
foo
group by date_period, day_period
order by date_period asc;
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
16.000000 | 2015-01-01 | 1
80.000000 | 2015-01-01 | 2
60.000000 | 2015-01-01 | 3
25.000000 | 2015-01-01 | 4
12.000000 | 2015-01-01 | 5
1.000000 | 2015-01-08 | 1
81.000000 | 2015-01-08 | 2
35.000000 | 2015-01-08 | 3
20.000000 | 2015-01-08 | 4
69.000000 | 2015-01-08 | 5
72.000000 | 2015-01-15 | 1
89.000000 | 2015-01-15 | 2
100.000000 | 2015-01-15 | 3
67.000000 | 2015-01-15 | 4
(14 rows)
将开始日期更改为' 2015-01-03'看它是否适当调整:
...
from stocks
where date >= '2015-01-03'
...
结果:
AVG | DATE_PERIOD | DAY_PERIOD
------------+-------------+------------
60.000000 | 2015-01-03 | 1
25.000000 | 2015-01-03 | 2
12.000000 | 2015-01-03 | 3
1.000000 | 2015-01-03 | 4
81.000000 | 2015-01-03 | 5
35.000000 | 2015-01-10 | 1
20.000000 | 2015-01-10 | 2
69.000000 | 2015-01-10 | 3
72.000000 | 2015-01-10 | 4
89.000000 | 2015-01-10 | 5
100.000000 | 2015-01-17 | 1
67.000000 | 2015-01-17 | 2
(12 rows)