我计划为商业智能系统设计一个数据库模型,该系统存储一组位置和一组年份的业务数据。
其中一些数字应根据同年和同一地点的其他数字计算。在下面的文字中,我将调用未计算的数字“基本数字”。要存储基本数字,使用这些列的表格设计是有意义的:
| year | location_id | goods_costs | marketing_costs | warehouse_costs | administrative_costs |
使用此表我可以创建一个计算所有其他必要数字的视图:
CREATE VIEW all_figures
SELECT *,
goods_costs + marketing_costs + warehouse_costs + administrative_costs
AS total_costs
FROM basic_figures
如果我没有遇到以下问题,这将是很好的:
~~
因此我考虑使用这个表设计:
+---------+-------------+-------------+-------+
| year | location_id | figure_id | value |
+---------+-------------+-------------+-------+
| 2009 | 1 | goods_costs | 300 |
...
这种类似于实体属性的设计可能是这三个问题的第一个解决方案。然而,它也会有一个新的缺点:计算变得混乱。真的很乱。
要构建类似于上面的视图,我必须使用这样的查询:
(SELECT * FROM basic_figures_eav)
UNION ALL
(SELECT a.year_id, a.location_id, "total_costs", a.value + b.value + c.value + d.value
FROM basic_figures_eav a
INNER JOIN basic_figures_eav b ON a.year_id = b.year_id AND a.location_id = b.location_id AND b.figure_id = "marketing_costs"
INNER JOIN basic_figures_eav c ON a.year_id = c.year_id AND a.location_id = c.location_id AND c.figure_id = "warehouse_costs"
INNER JOIN basic_figures_eav d ON a.year_id = d.year_id AND a.location_id = d.location_id AND d.figure_id = "administrative_costs"
WHERE a.figure_id = "goods_costs");
这不是美女吗?请注意,这只是一个数字的查询。所有其他计算出来的数字(我上面写的很多都是这些数字)也必须使用此查询进行UNIONed。
~~
在对我的问题进行了长时间的解释之后,我现在逐渐解决了我的实际问题:
顺便说一句:我已经在MySQL论坛上问了一个类似的问题。但是,由于答案有点稀疏,毕竟这不仅仅是一个MySQL问题,我完全重写了我的问题并将其发布在这里。 (所以这不是一个交叉的帖子。)以下是该主题的链接:http://forums.mysql.com/read.php?125,560752,560752#msg-560752
答案 0 :(得分:1)
问题 (至少在某种程度上)是DBMS特有的。
如果您可以考虑使用其他DBMS,您可能需要查看PostgreSQL及其hstore
数据类型,它本质上是一个键/值对。
缩小尺寸是因为所有内容都作为字符串存储在地图中而丢失数据类型检查。
您的目标设计称为“实体属性值”。您可能还想找到其他替代方案。
修改,以下是如何使用该示例的示例:
CREATE TABLE basic_figures
(
year_id integer,
location_id integer,
figures hstore
);
insert into basic_figures (year_id, location_id, figures)
values
(1, 1, hstore ('marketing_costs => 200, goods_costs => 100, warehouse_costs => 400')),
(1, 2, hstore ('marketing_costs => 50, goods_costs => 75, warehouse_costs => 250')),
(1, 3, hstore ('adminstrative_costs => 100'));
select year_id,
location_id,
to_number(figures -> 'marketing_costs', 'FM999999') as marketing_costs,
to_number(figures -> 'goods_costs', 'FM999999') as goods_costs,
to_number(figures -> 'warehouse_costs', 'FM999999') as warehouse_costs,
to_number(figures -> 'adminstrative_costs', 'FM999999') as adminstrative_costs
from basic_figures bf;
为隐藏hstore值转换的视图创建视图可能更容易。缺点是,每次添加新的成本类型时都需要重新创建视图。
要获得每个year_id / location_id的所有费用的总和,您可以使用以下声明:
SELECT year_id,
location_id,
sum(to_number(value, '99999')) as total
FROM (
SELECT year_id,
location_id,
(each(figures)).key,
(each(figures)).value
FROM basic_figures
) AS data
GROUP BY year_id, location_id;
year_id | location_id | total ---------+-------------+------- 1 | 3 | 100 1 | 2 | 375 1 | 1 | 700
可以加入上面的查询,但如果您创建一个计算单个hstore
列中所有键的总计的函数,则可能更快更容易使用:
create or replace function sum_hstore(figures hstore)
returns bigint
as
$body$
declare
result bigint;
figure_values text[];
begin
result := 0;
figure_values := avals(figures);
for i in 1..array_length(figure_values, 1) loop
result := result + to_number(figure_values[i], '999999');
end loop;
return result;
end;
$body$
language plpgsql;
该功能可以在第一个选择中轻松使用:
select bf.year_id,
bf.location_id,
to_number(bf.figures -> 'marketing_costs', '99999999') as marketing_costs,
to_number(bf.figures -> 'goods_costs', '99999999') as goods_costs,
to_number(bf.figures -> 'warehouse_costs', '99999999') as warehouse_costs,
to_number(bf.figures -> 'adminstrative_costs', '99999999') as adminstrative_costs,
sum_hstore(bf.figures) as total
from basic_figures bf;
以下PL / pgSQL块可用于(重新)创建一个视图,其中包含数字列中每个键的一列以及基于上述sum_hstore函数的总计:
do
$body$
declare
create_sql text;
types record;
begin
create_sql := 'create or replace view extended_figures as select year_id, location_id ';
for types in SELECT distinct (each(figures)).key as type_name FROM basic_figures loop
create_sql := create_sql || ', to_number(figures -> '''||types.type_name||''', ''9999999'') as '||types.type_name;
end loop;
create_sql := create_sql ||', sum_hstore(figures) as total from basic_figures';
execute create_sql;
end;
$body$
language plpgsql;
运行该功能后,您只需执行以下操作:
选择* 来自extended_figures
并且您将获得尽可能多的列,因为有不同的成本类型。
请注意,如果hstore中的值实际上是数字,则根本没有错误检查。这可能是通过触发器完成的。
答案 1 :(得分:0)
这是一种在不需要枢轴的情况下“非规范化”(转动)EAV表的方法。注意左边的JOIN和coalesce,这会导致不存在的行显示为“零成本”。 注意:我必须将字符串文字的引用替换为单引号。
CREATE TABLE basic_figures_eav
( year_id INTEGER
, location_id INTEGER
, figure_id varchar
, value INTEGER
);
INSERT INTO basic_figures_eav ( year_id , location_id , figure_id , value ) VALUES
(1,1,'goods_costs', 100)
, (1,1,'marketing_costs', 200)
, (1,1,'warehouse_costs', 400)
, (1,1,'administrative_costs', 800)
, (1,2,'goods_costs', 100)
, (1,2,'marketing_costs', 200)
, (1,2,'warehouse_costs', 400)
, (1,3,'administrative_costs', 800)
;
SELECT x.year_id, x.location_id
, COALESCE (a.value,0) AS goods_costs
, COALESCE (b.value,0) AS marketing_costs
, COALESCE (c.value,0) AS warehouse_costs
, COALESCE (d.value,0) AS administrative_costs
--
, COALESCE (a.value,0)
+ COALESCE (b.value,0)
+ COALESCE (c.value,0)
+ COALESCE (d.value,0)
AS total_costs
-- need this to get all the {year_id,location_id} combinations
-- that have at least one tuple in the EAV table
FROM (
SELECT DISTINCT year_id, location_id
FROM basic_figures_eav
-- WHERE <selection of wanted observations>
) AS x
LEFT JOIN basic_figures_eav a ON a.year_id = x.year_id AND a.location_id = x.location_id AND a.figure_id = 'goods_costs'
LEFT JOIN basic_figures_eav b ON b.year_id = x.year_id AND b.location_id = x.location_id AND b.figure_id = 'marketing_costs'
LEFT JOIN basic_figures_eav c ON c.year_id = x.year_id AND c.location_id = x.location_id AND c.figure_id = 'warehouse_costs'
LEFT JOIN basic_figures_eav d ON d.year_id = x.year_id AND d.location_id = x.location_id AND d.figure_id = 'administrative_costs'
;
结果:
CREATE TABLE
INSERT 0 8
year_id | location_id | goods_costs | marketing_costs | warehouse_costs | administrative_costs | total_costs
---------+-------------+-------------+-----------------+-----------------+----------------------+-------------
1 | 3 | 0 | 0 | 0 | 800 | 800
1 | 2 | 100 | 200 | 400 | 0 | 700
1 | 1 | 100 | 200 | 400 | 800 | 1500
(3 rows)
答案 2 :(得分:0)
我只想指出你的查询的后半部分是不必要的复杂。你可以这样做:
(SELECT a.year_id, a.location_id, "total_costs",
sum(a.value)
FROM basic_figures_eav a
where a.figure_id in ('marketing_costs', 'warehouse_costs', 'administrative_costs',
'goods_costs')
)
虽然这使用聚合,在year_id,location_id和figure_id上使用复合索引,但性能应该相似。
至于问题的其余部分,数据库限制列数存在问题。我建议您将基础数据放在一个表中,并使用自动递增的主键。然后,创建由同一主键链接的汇总表。
在许多环境中,您可以每天或每晚重新创建一次汇总表。如果需要实时信息,可以使用存储过程/触发器来更新数据。也就是说,在更新或插入数据时,可以在摘要表中对其进行修改。
另外,我试图找出SQL Server中的计算/计算列是否计入表中的最大列数(1,024)。我无法找到任何确定的东西。这很容易测试,但我现在不在数据库附近。