"运行产品" PostgreSql中的聚合/窗口函数?

时间:2017-07-21 22:14:55

标签: sql postgresql aggregate

我正在尝试规范化PostgreSql中的日终股票价格。

我们说我有一个定义如下的股票表:

create table eod (
  date date not null,
  stock_id int not null,
  split decimal(16,8) not null,
  close decimal(12,6) not null,
  constraint pk_eod primary key (date, stock_id)
);

此表中的数据可能如下所示:

"date","stock_id","eod_split","close"
"2014-06-13",14010920,"1.00000000","182.560000"
"2014-06-13",14010911,"1.00000000","91.280000"
"2014-06-13",14010923,"1.00000000","41.230000"
"2014-06-12",14010911,"1.00000000","92.290000"
"2014-06-12",14010920,"1.00000000","181.220000"
"2014-06-12",14010923,"1.00000000","40.580000"
"2014-06-11",14010920,"1.00000000","182.250000"
"2014-06-11",14010911,"1.00000000","93.860000"
"2014-06-11",14010923,"1.00000000","40.860000"
"2014-06-10",14010911,"1.00000000","94.250000"
"2014-06-10",14010923,"1.00000000","41.110000"
"2014-06-10",14010920,"1.00000000","184.290000"
"2014-06-09",14010920,"1.00000000","186.220000"
"2014-06-09",14010911,"7.00000000","93.700000"
"2014-06-09",14010923,"1.00000000","41.270000"
"2014-06-06",14010923,"1.00000000","41.480000"
"2014-06-06",14010911,"1.00000000","645.570000"
"2014-06-06",14010920,"1.00000000","186.370000"
"2014-06-05",14010920,"1.00000000","185.980000"
"2014-06-05",14010911,"1.00000000","647.350000"
"2014-06-05",14010923,"1.00000000","41.210000"
... 
"2005-03-04",14010920,"1.00000000","92.370000"
"2005-03-04",14010911,"1.00000000","42.810000"
"2005-03-04",14010923,"1.00000000","25.170000"
"2005-03-03",14010923,"1.00000000","25.170000"
"2005-03-03",14010911,"1.00000000","41.790000"
"2005-03-03",14010920,"1.00000000","92.410000"
"2005-03-02",14010920,"1.00000000","92.920000"
"2005-03-02",14010923,"1.00000000","25.260000"
"2005-03-02",14010911,"1.00000000","44.121000"
"2005-03-01",14010920,"1.00000000","93.300000"
"2005-03-01",14010923,"1.00000000","25.280000"
"2005-03-01",14010911,"1.00000000","44.500000"
"2005-02-28",14010923,"1.00000000","25.160000"
"2005-02-28",14010911,"2.00000000","44.860000"
"2005-02-28",14010920,"1.00000000","92.580000"
"2005-02-25",14010923,"1.00000000","25.250000"
"2005-02-25",14010920,"1.00000000","92.800000"
"2005-02-25",14010911,"1.00000000","88.990000"
"2005-02-24",14010923,"1.00000000","25.370000"
"2005-02-24",14010920,"1.00000000","92.640000"
"2005-02-24",14010911,"1.00000000","88.930000"
"2005-02-23",14010923,"1.00000000","25.200000"
"2005-02-23",14010911,"1.00000000","88.230000"
"2005-02-23",14010920,"1.00000000","92.100000"
...
"2003-02-24",14010920,"1.00000000","78.560000"
"2003-02-24",14010911,"1.00000000","14.740000"
"2003-02-24",14010923,"1.00000000","24.070000"
"2003-02-21",14010920,"1.00000000","79.950000"
"2003-02-21",14010923,"1.00000000","24.630000"
"2003-02-21",14010911,"1.00000000","15.000000"
"2003-02-20",14010911,"1.00000000","14.770000"
"2003-02-20",14010920,"1.00000000","79.150000"
"2003-02-20",14010923,"1.00000000","24.140000"
"2003-02-19",14010920,"1.00000000","79.510000"
"2003-02-19",14010911,"1.00000000","14.850000"
"2003-02-19",14010923,"1.00000000","24.530000"
"2003-02-18",14010923,"2.00000000","24.960000"
"2003-02-18",14010911,"1.00000000","15.270000"
"2003-02-18",14010920,"1.00000000","79.330000"
"2003-02-14",14010911,"1.00000000","14.670000"
"2003-02-14",14010920,"1.00000000","77.450000"
"2003-02-14",14010923,"1.00000000","48.300000"
"2003-02-13",14010920,"1.00000000","75.860000"
"2003-02-13",14010911,"1.00000000","14.540000"
"2003-02-13",14010923,"1.00000000","46.990000"

注意"分裂"柱。当记录除1以外的拆分值时,它基本上意味着股票股票按该因子分配。 IOW,当拆分为2.0时,已发行股票的数量翻了一番,但每个股票的价值从那一点减半。如果股票价值每股100美元,那么它现在每股价值50美元。

如果你用原始数字来表示这一点,这种事情真的很难看。当公司的整体价值没有显着变化时,突然出现悬崖......当你有多个分裂时,你最终得到的图表不能正确反映公司的趋势,通常是大幅度的。在上面的例子中,有一个2:1的分割,你的股票收盘价看起来像100,100,100,50,50,50。

我想用这个表创建一个"标准化的"价格,以合理有效的方式(有相当多的记录块)。继续样本,这将显示股票价格为50,50,50,50,50,50。如果有多个拆分,如果我们忽略实际的市场价值变化,数据仍应保持一致和平稳。

我的想法是,如果我可以创建一个"运行产品的CTE"分裂值的总和,回到过去,我可以定义每个股票的日期范围以及应用于结算成本的修饰符值应该是什么,然后将其连接回eod表并选择调整后的接近值的新表对于每只股票。

......问题是,除了一大堆临时表和多步骤流程之外,我无法解决除了如何做之外的事情。我不知道有任何内置功能可以让这更容易。

有人可以告诉我如何生成规范化数据吗?

2 个答案:

答案 0 :(得分:4)

您不需要CTE。你只需要一个累积的产品。 Postgres没有一个内置。但是,算术救援!

select eod.*,
       exp(sum(ln(eod_split)) over (partition by stock_id order by date)) as cume_split,
       (close *
        exp(sum(ln(eod_split)) over (partition by stock_id order by date))
       ) as normalized_price
from eod;

答案 1 :(得分:1)

好笑,正在寻找这个解决方案,我发现一个同事已经问过它了。这是这个巧妙解决方案背后的基本代数:https://blog.prepscholar.com/natural-log-rules