SQLite自动创建列

时间:2019-06-12 02:11:10

标签: python pandas sqlite

我有一个包含库存数据收盘价的SQL表,例如:

Date        AAOI  ABIL  ACIA  ACIW  ...  ZG   ZIXI 
2000-01-03   NaN   NaN   NaN  8.94  ... NaN  37.19
2000-01-04   NaN   NaN   NaN  8.33  ... NaN  36.50
2000-01-05   NaN   NaN   NaN  8.06  ... NaN  37.28
2000-01-06   NaN   NaN   NaN  7.98  ... NaN  35.25
2000-01-07   NaN   NaN   NaN  7.81  ... NaN  38.00

是否可以将等于各股票对比率的列追加到此表中?即

Date     AAOI  ABIL  ACIA  ACIW  ...  ZG   ZIXI  AAOI/ABIL  AAOI/ACIA  ... AAOI/ZIXI  ABIL/AAOI ... 

还是可能创建一个新表?

我有数千列的股票数据,比率列的数量将达到数万甚至更多。

是否可以通过某种循环自动创建这些列?这是我的第一个SQLite项目,不确定如何继续进行。

任何其他信息或建议,我们将不胜感激。谢谢!

2 个答案:

答案 0 :(得分:1)

您真的一次需要成千上万的比率吗?并不是说您可以拥有该列数,因为每个表的默认限制是 2000 列,因此最多可以增加到 32767 Limits In SQLite - Maximum Number Of Columns

但是每个表可以有成千上万的行。因此,您可能希望考虑每个日期每个股票代码的行。

也许考虑以下情况,它不会在表中添加一列,但会获取从两个股票代码的日期范围的股票价格得出的比率(不知道这是否是您想要的精确计算):-

DROP TABLE IF EXISTS closing_price; 
CREATE TABLE IF NOT EXISTS closing_price (closingdate TEXT, stockcode TEXT, stockprice REAL, UNIQUE(closingdate, stockcode));
INSERT INTO closing_price VALUES
    ('2001-01-03','AAOI',null),('2001-01-03','ABIL',null),('2001-01-03','ACIA',null),('2001-01-03','ACIW',8.94),('2001-01-03','ZG',null),('2001-01-03','ZIXI',37.19),
    ('2001-01-04','AAOI',null),('2001-01-04','ABIL',null),('2001-01-04','ACIA',null),('2001-01-04','ACIW',8.33),('2001-01-04','ZG',null),('2001-01-04','ZIXI',36.50),
    ('2001-01-05','AAOI',null),('2001-01-05','ABIL',null),('2001-01-05','ACIA',null),('2001-01-05','ACIW',8.06),('2001-01-05','ZG',null),('2001-01-05','ZIXI',37.28),
    ('2001-01-06','AAOI',null),('2001-01-06','ABIL',null),('2001-01-06','ACIA',null),('2001-01-06','ACIW',7.98),('2001-01-06','ZG',null),('2001-01-06','ZIXI',35.25),
    ('2001-01-07','AAOI',null),('2001-01-07','ABIL',null),('2001-01-07','ACIA',null),('2001-01-07','ACIW',7.81),('2001-01-07','ZG',null),('2001-01-07','ZIXI',38.00)
;

-- Ratio for a single day between ACIW and ZIXI
SELECT (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
    ) 
    / (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-03'
    ) 
    AS ratio
;
-- Ratio for the 5 days between ACIW and ZIXI
SELECT (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ACIW' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
    ) 
    / (
    SELECT sum(stockprice) FROM closing_price WHERE stockcode = 'ZIXI' AND closingdate BETWEEN '2001-01-03' AND '2001-01-07'
    ) 
    AS ratio
;

以上方法使用单个表,但每个股票代码/结束日期组合使用一行,并具有由股票代码/结束日期的组合组成的UNIQUE索引。

该表如下所示:-

enter image description here

然后,它使用查询为给定日期范围内的一对特定的股票代码计算比率(第一个查询针对一天,第二个查询针对4天)。

结果是:-

  1. (在2001年1月1日的ACIW和ZIXI之间)

enter image description here

  1. (在ACIW和ZIXI之间,为2001年1月3日至2001年1月7日的5天)

enter image description here

其他

  

是否可以使用您列出的方法获得比率   而不是单独调用它们(此处组合太多)?

忽略空值(至少为了简洁/有用),您可以执行以下操作(但要注意处理时间),然后可能适合以下操作:-

WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
)
SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined WHERE ratio IS NOT NULL;

这将导致:-

enter image description here

(来自150个组合,其余为空)

您可以使用:-

修改上面的内容来添加日期范围
WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06' --<<<<<<<<<< ADDED
)
SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined 
WHERE ratio IS NOT NULL
;

这将导致:-

enter image description here

  • 以上

    1. 为每个UNIQUE股票代码创建一个CTE(公用表表达式(这样的临时表)),CTE的名称为 allstocks

    2. 根据收盘价表与股票代码不匹配的 allstocks CTE(以及第二个给定日期范围的CTE)结合,创建另一个CTE例)。结果烷烃CTE被命名为 combined

    3. 然后,选择合并的 CTE中的每一行,根据第一个示例,比率是根据每一行中的两个股票代码得出的。

如果要将比率存储在表中,则可以定义一个表,例如:-

CREATE TABLE IF NOT EXISTS ratio (closingdate TEXT, stockcode1 TEXT, stockcode2 TEXT, ratio, PRIMARY KEY(closingdate, stockcode1,stockcode2));

并使用:-

WITH
allstocks AS (SELECT DISTINCT stockcode FROM closing_price),
combined AS (
SELECT DISTINCT closing_price.closingdate, closing_price.stockcode AS sc1, allstocks.stockcode AS sc2 
FROM closing_price JOIN allstocks ON closing_price.stockcode <> allstocks.stockcode
WHERE closingdate BETWEEN '2001-01-04' AND '2001-01-06'
)
INSERT OR IGNORE INTO ratio SELECT closingdate, sc1, sc2, 
    (SELECT stockprice FROM closing_price WHERE stockcode = sc1 AND closing_price.closingdate = combined.closingdate) /
    (SELECT stockprice FROM closing_price WHERE stockcode = sc2 AND closing_price.closingdate = combined.closingdate) AS ratio
FROM combined
;
  • 请注意,PRIMARY KEY与INSERT或IGNORE结合使用会消除添加重复项的机会。

答案 1 :(得分:0)

仅是我个人的观点,但我真的认为您在此处滥用了SQLite。

数据库的主要目标是提供结构化存储,几乎没有冗余。冗余在数据库中被认为是不好的,因为更新不当可能会留下不一致的数据。确实有 denormalized 列(这意味着可以从其他列中计算出的列)的确很常见,但是通常在这些计算过于复杂而无法容纳查询时使用,并且您广泛地向用户宣传它们是计算值。

在这里,您仅具有易于在选择请求中编写的比率,因此恕我直言,将比率存储在数据库中是没有用的:它只是浪费空间。很容易写:

SELECT AAOI,  ABIL,  ACIA, AAOI/ABIL, ACIA/ABIL, ABIL/ACIA
FROM ...

在熊猫数据框中而不是在数据库中具有这些比例当然是有意义的。而且很容易用熊猫来计算它们:

cols = list(df.columns[1:])
for i, c in enumerate(cols[:-1]):
    for c2 in cols[i+1:]:
        df['{}/{}'.format(c, c2)] = df[c]/df[c2]

如果出于性能原因必须存储这些比率,请尝试将其存储在数据库外部(例如,存储在csv文件中)或单独的表中。