首先,我想为 n + 1st Query-too-complex-question道歉。
自I've been told that my dtabase deserves normalisation以来,我尝试使用规范化设置做同样的事情。但是,Access现在抱怨过于复杂的查询。
我想做什么:起点是一个产生字段Item ID, Difference in attribute 1, Group of attribute 1, Difference in Attribute 2, Group of attribute 2,...
的查询(大约10.000行,查询只是要比较的两个数据集的等值连接)。对于每个属性,我想绘制一个直方图,显示该属性的差异分布。实际上,我想在一个画布中绘制两个直方图,其中一个约束为Group = 1。
我尝试了什么:
union
s),产生列Item ID, attribute, difference, group
。此查询产生大约100.000行。union
带有一个带有额外group = 1
条件的自身副本。所以我尝试了以下查询:
select bin, max(c), max(c2) from (
-- collect the four corners of the first histogram
select bin, cnt as c, 1 as c2, ord from (
-- top right
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
Count(bin) AS cnt,
1 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
-- bottom right
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
0 AS cnt,
2 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
-- bottom left
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
0 AS cnt,
3 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
union all
-- top left
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
Count(bin) AS cnt,
4 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
order by bin, ord asc
)
union all
-- connect the corners of the other one
select bin, 1 as c, cnt as c2, ord from (
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
Count(bin) AS cnt,
1 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
0 AS cnt,
2 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
0 AS cnt,
3 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
Count(bin) AS cnt,
4 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
order by bin, ord asc
)
)
group by bin, ord
order by bin, ord asc
计算两个最里面的union
,以及中间级别的两个查询。但是,当我尝试计算最外面的union
时,访问会抱怨查询的复杂性(在非标准化查询中没有发生)
问题:我有机会解决这个问题吗?
备注:引入了中间步骤以简化自动代码生成。去除并不能解决问题。
编辑:表中的内容:根据要求,我将添加一些有关表中存储内容和查询内容的信息。我有两张桌子
create table A (
id integer not null primary key,
[attribute 1] integer,
[attribute 2] integer,
...
)
create table B (
id integer not null primary key,
[attribute 1] integer,
[attribute 2] integer,
...
)
和查询differences
抱怨他们:
select
id,
A.[attribute 1] - B.[attribute 1] as [delta 1],
...
from A
inner join B on A.id = B.id
我知道这个非标准化的数据模型是糟糕的设计,但我不是设计模型的负责人。这就是为什么已经建立了一个查询normalized_data
,它可以从differences
中取消数据的显示:
select
id,
'attribute 1' as attribute,
delta 1 as difference
from differences
union all
select
id,
'attribute 2' as attribute,
delta 2 as difference
from differences
union all
...
请注意,如果使用differences
中的非标准化数据作为输入,或使用来自normalized_data
的不透露数据,则上述查询的代码不会发生太大变化。