我有一个包含库存数据收盘价的SQL表,例如:
Date AAOI ABIL ACIA ACIW ... ZG ZIXI
2000-01-03 NaN NaN NaN 8.94 ... NaN 37.19
2000-01-04 NaN NaN NaN 8.33 ... NaN 36.50
2000-01-05 NaN NaN NaN 8.06 ... NaN 37.28
2000-01-06 NaN NaN NaN 7.98 ... NaN 35.25
2000-01-07 NaN NaN NaN 7.81 ... NaN 38.00
是否可以将等于各股票对比率的列追加到此表中?即
Date AAOI ABIL ACIA ACIW ... ZG ZIXI AAOI/ABIL AAOI/ACIA ... AAOI/ZIXI ABIL/AAOI ...
还是可能创建一个新表?
我有数千列的股票数据,比率列的数量将达到数万甚至更多。
是否可以通过某种循环自动创建这些列?这是我的第一个SQLite项目,不确定如何继续进行。
任何其他信息或建议,我们将不胜感激。谢谢!
答案 0 :(得分:1)
您真的一次需要成千上万的比率吗?并不是说您可以拥有该列数,因为每个表的默认限制是 2000 列,因此最多可以增加到 32767 。 Limits In SQLite - Maximum Number Of Columns
但是每个表可以有成千上万的行。因此,您可能希望考虑每个日期每个股票代码的行。
也许考虑以下情况,它不会在表中添加一列,但会获取从两个股票代码的日期范围的股票价格得出的比率(不知道这是否是您想要的精确计算):->
DROP TABLE IF EXISTS closing_price;
CREATE TABLE IF NOT EXISTS closing_price (closingdate TEXT, stockcode TEXT, stockprice REAL, UNIQUE(closingdate, stockcode));
INSERT INTO closing_price VALUES
('2001-01-03','AAOI',null),('2001-01-03','ABIL',null),('2001-01-03','ACIA',null),('2001-01-03','ACIW',8.94),('2001-01-03','ZG',null),('2001-01-03','ZIXI',37.19),
('2001-01-04','AAOI',null),('2001-01-04','ABIL',null),('2001-01-04','ACIA',null),('2001-01-04','ACIW',8.33),('2001-01-04','ZG',null),('2001-01-04','ZIXI',36.50),
('2001-01-05','AAOI',null),('2001-01-05','ABIL',null),('2001-01-05','ACIA',null),('2001-01-05','ACIW',8.06),('2001-01-05','ZG',null),('2001-01-05','ZIXI',37.28),
('2001-01-06','AAOI',null),('2001-01-06','ABIL',null),('2001-01-06','ACIA',null),('2001-01-06','ACIW',7.98),('2001-01-06','ZG',null),('2001-01-06','ZIXI',35.25),
('2001-01-07','AAOI',null),('2001-01-07','ABIL',null),('2001-01-07','ACIA',null),('2001-01-07','ACIW',7.81),('2001-01-07','ZG',null),('2001-01-07','ZIXI',38.00)
;
-- Ratio for a single day between ACIW and ZIXI
SELECT (
SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
)
/ (
SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
)
AS ratio
;
-- Ratio for the 5 days between ACIW and ZIXI
SELECT (
SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
)
/ (
SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
)
AS ratio
;
以上方法使用单个表,但每个股票代码/结束日期组合使用一行,并具有由股票代码/结束日期的组合组成的UNIQUE索引。
该表如下所示:-
然后,它使用查询为给定日期范围内的一对特定的股票代码计算比率(第一个查询针对一天,第二个查询针对4天)。
结果是:-
是否可以使用您列出的方法获得比率 而不是单独调用它们(此处组合太多)?
忽略空值(至少为了简洁/有用),您可以执行以下操作(但要注意处理时间),然后可能适合以下操作:-
WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
)
SELECT closingdate, sc1, sc2,
(SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
(SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined WHERE ratio IS NOT NULL;
这将导致:-
(来自150个组合,其余为空)
您可以使用:-
修改上面的内容来添加日期范围WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06' --<<<<<<<<<< ADDED
)
SELECT closingdate, sc1, sc2,
(SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
(SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined
WHERE ratio IS NOT NULL
;
这将导致:-
以上
为每个UNIQUE股票代码创建一个CTE(公用表表达式(这样的临时表)),CTE的名称为 allstocks 。
根据收盘价表与股票代码不匹配的 allstocks CTE(以及第二个给定日期范围的CTE)结合,创建另一个CTE例)。结果烷烃CTE被命名为 combined
如果要将比率存储在表中,则可以定义一个表,例如:-
CREATE TABLE IF NOT EXISTS ratio (closingdate TEXT, stockcode1 TEXT, stockcode2 TEXT, ratio, PRIMARY KEY(closingdate, stockcode1,stockcode2));
并使用:-
WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06'
)
INSERT OR IGNORE INTO ratio SELECT closingdate, sc1, sc2,
(SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
(SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined
;
答案 1 :(得分:0)
仅是我个人的观点,但我真的认为您在此处滥用了SQLite。
数据库的主要目标是提供结构化存储,几乎没有冗余。冗余在数据库中被认为是不好的,因为更新不当可能会留下不一致的数据。确实有 denormalized 列(这意味着可以从其他列中计算出的列)的确很常见,但是通常在这些计算过于复杂而无法容纳查询时使用,并且您广泛地向用户宣传它们是计算值。
在这里,您仅具有易于在选择请求中编写的比率,因此恕我直言,将比率存储在数据库中是没有用的:它只是浪费空间。很容易写:
SELECT AAOI, ABIL, ACIA, AAOI/ABIL, ACIA/ABIL, ABIL/ACIA
FROM ...
在熊猫数据框中而不是在数据库中具有这些比例当然是有意义的。而且很容易用熊猫来计算它们:
cols = list(df.columns[1:])
for i, c in enumerate(cols[:-1]):
for c2 in cols[i+1:]:
df['{}/{}'.format(c, c2)] = df[c]/df[c2]
如果出于性能原因必须存储这些比率,请尝试将其存储在数据库外部(例如,存储在csv文件中)或单独的表中。