给出4个表,每个表包含项并代表一组,如何获取绘制Venn图所需的每个隔间中的项数,如下所示。计算应在MySQL服务器中进行,以避免将项目传输到应用程序服务器。
示例表格:
s1: s2: s3: s4:
+------+ +------+ +------+ +------+
| item | | item | | item | | item |
+------+ +------+ +------+ +------+
| a | | a | | a | | a |
+------+ +------+ +------+ +------+
| b | | b | | b | | c |
+------+ +------+ +------+ +------+
| c | | c | | d | | d |
+------+ +------+ +------+ +------+
| d | | e | | e | | e |
+------+ +------+ +------+ +------+
| ... | | ... | | ... | | ... |
现在,我想我会计算一些设定功率。 I
对应于s1
,II
至s2
,III
至s3
和IV
至{{1}的一些示例}:
如果我将s4
重新解释为集合,我会写:
sx
-中间的白色25 |s1 ∩ s2 ∩ s3 ∩ s4|
-相对于中心下方的白色15 |(s1 ∩ s2 ∩ s4) \ s3|
-底部的白色5 |(s1 ∩ s4) \ (s2 ∪ s3)|
-蓝色地面上的深蓝色60 如何在MySQL服务器上有效地计算这些能力? MySQL是否提供辅助计算的功能?
天真的方法将对1运行查询。
|s1 \ (s2 ∪ s3 ∪ s4)|
和另一个查询2。
SELECT count(*) FROM(
SELECT item FROM s1
INTERSECT
SELECT item FROM s2
INTERSECT
SELECT item FROM s3
INTERSECT
SELECT item FROM s4);
依此类推,导致15个查询。
答案 0 :(得分:1)
尝试这样的事情:
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
免责声明:我仅在SQLite中对此进行了测试。您可能需要SET sql_mode='PIPES_AS_CONCAT'
才能使ANSI字符串连接在MySQL中工作,或者改用concat
函数。 WITH
语法仅从MySQL 8.0版本开始受支持,但是您可以改用临时表或嵌套查询。
如果集合很大,则可能需要在查询之前为item
列建立索引,以防SQL优化器无法自行解决。
答案 1 :(得分:0)
以下过程:
INTERSECT
或EXCEPT
。但是您可以效仿它们。通过从原始数据/原始集中删除重复项,可以进一步简化仿真。我目前想出的是https://gist.github.com/Rillke/c2da0921f8f2a047615f41fab8781c11
答案 2 :(得分:0)
这个问题有点复杂,所以答案是。让我解释一下韩国电信的答案
with universe as (
select * from s1
union
select * from s2
union
select * from s3
union
select * from s4
),
regions as (
select
case when s1.item is null then '0' else '1' end
||
case when s2.item is null then '0' else '1' end
||
case when s3.item is null then '0' else '1' end
||
case when s4.item is null then '0' else '1' end as Region
from universe u
left join s1 on u.item = s1.item
left join s2 on u.item = s2.item
left join s3 on u.item = s3.item
left join s4 on u.item = s4.item
)
select Region, count(*) from regions group by Region
universe
导致所有表的UNION(消除重复),类似
+------+
| item |
+------+
| a |
+------+
| b |
+------+
| c |
+------+
| d |
+------+
| e |
+------+
| ... |
+------+
然后,将s1,s2,s3和s4连接起来
+------+---------+---------+---------+---------+
| item | s1.item | s2.item | s3.item | s4.item |
+------+---------+---------+---------+---------+
| a | a | a | a | a |
+------+---------+---------+---------+---------+
| b | b | b | b | NULL |
+------+---------+---------+---------+---------+
| c | c | c | NULL | c |
+------+---------+---------+---------+---------+
| d | d | NULL | d | d |
+------+---------+---------+---------+---------+
| e | NULL | e | e | e |
+------+---------+---------+---------+---------+
| ... | ... | ... | ... | ... |
+------+---------+---------+---------+---------+
并转换为称为Region
的二进制字符串(0:如果单元格为NULL; 1:否则),其中第一个数字对应于s1,第二个数字对应于s2,依此类推
+------+--------+
| item | Region |
+------+--------+
| a | 1111 |
+------+--------+
| b | 1110 |
+------+--------+
| c | 1101 |
+------+--------+
| d | 1011 |
+------+--------+
| e | 0111 |
+------+--------+
| ... | ... |
+------+--------+
最后按地区进行汇总和分组
+--------+-------+
| Region | count |
+--------+-------+
| 1111 | 1 |
+--------+-------+
| 1110 | 1 |
+--------+-------+
| 1101 | 1 |
+--------+-------+
| 1011 | 1 |
+--------+-------+
| 0111 | 1 |
+--------+-------+
| ... | |
+--------+-------+
请注意,其中包含0个设置元素的区域不会显示在结果中,并且0000
永远不会(=项目不属于任何集合s1,s2,s3,s4的一部分),因此有15个区域。 / p>