对mysql结果的布尔运算

时间:2013-05-13 14:24:21

标签: mysql sql merge intersect booleanquery

我有3个Mysql表:

[block_value]

  • id_block_value
  • 的file_id

[元数据]

  • id_metadata
  • metadata_name

[metadata_value]

  • meta_id
  • blockvalue_id

在这些表中,有一对:metadata_name = value 对列表放在块(id_block_value

(A)如果我想要身高= 1080:

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080");

+---------+
| file_id |
+---------+
|      21 | 
|      22 |
(...)
|    6962 |
(...)
|    8146 | 
|    8147 | 
+---------+
794 rows in set (0.06 sec)

(B)如果我想要文件扩展名= mpeg:

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg");

+---------+
| file_id |
+---------+
|    6889 | 
|    6898 | 
|    6962 | 
+---------+
3 rows in set (0.06 sec)

但是,如果我想要的话:

  • A和B
  • A或B
  • A而不是B

然后,我不知道什么是最好的。

对于A or B,我尝试A union B似乎可以解决问题。

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080")
UNION
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg");
+---------+
| file_id |
+---------+
|      21 | 
|      22 | 
|      34 |
(...)
|    6889 | 
|    6898 | 
+---------+
796 rows in set (0.13 sec)

对于A and B,由于Mysql中没有intersect,我尝试了A and file_id in(B),但请查看perfs(> 4mn)......

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080")
and file_id in(
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg"));

+---------+
| file_id |
+---------+
|    6962 | 
+---------+
1 row in set (4 min 36.22 sec)

我也试过了B and file_id in(A),这好多了,但我永远都不知道先放哪个。

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "file extension" and value = "mpeg")
and file_id in(
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080"));

+---------+
| file_id |
+---------+
|    6962 | 
+---------+
1 row in set (0.75 sec)

所以...我现在该怎么办? 布尔运算有没有更好的方法?有提示吗?我错过了什么吗?

编辑:看起来是什么数据:

此数据库在FILE表中包含插入的每个音频/视频文件的行:

  • 10,/ path / to / file.ts
  • 11,/ path / to / file2.mpeg

每个潜在信息的METADATA表格中都有一行:

  • 301,身高
  • 302,文件扩展名

然后,BLOCK表中的一行定义了一个容器:

  • 101,视频
  • 102,音频
  • 104,General

文件可以包含多个元数据块,BLOCK_VALUE表包含BLOCKS实例:

  • 402,101,10 //视频1
  • 403,101,10 // Video 2
  • 404,101,10 // Video 3
  • 405,102,10 //音频
  • 406,104,10 //一般

在此示例中,文件10有5个块:3个视频(101)+ 1个音频(102)+ 1个常规(104)

值存储在METADATA_VALUE

  • 302,406,“ts”//文件扩展名,一般
  • 301,402,“1080”//高度,视频1
  • 301,403,“720”// height,Video 2
  • 301,404,“352”// height,Video 3

3 个答案:

答案 0 :(得分:1)

对于“OR”为什么不在没有UNION的情况下尝试...我错过了什么?

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080") 
OR (metadata_name = "file extension" and value = "mpeg")

对于“AND”,在元数据表上使用两次内连接,以确保只获得满足两个条件的file_id ...

SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     AND (M.metadata_name = "height" and MV.value = "1080")
     INNER JOIN metadata M2 ON MV.meta_id = M2.id_metadata
     AND (M2.metadata_name = "file extension" and MV.value = "mpeg")
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 

“A”而不是“B”,在“B”条件下使用左连接而不是内连接。添加WHERE子句,指定您不希望“B”

的结果
SELECT DISTINCT file_id 
FROM metadata_value MV 
     INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
     AND (M.metadata_name = "height" and MV.value = "1080") 
     LEFT JOIN metadata M2 ON MV.meta_id = M2.id_metadata
     AND (M2.metadata_name = "file extension" and MV.value = "mpeg")
     INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE M2.id_metadata is NULL

答案 1 :(得分:1)

OR版本: (来自ChrisCamp答案的无耻复制和粘贴)

 SELECT distinct file_id 
   FROM metadata_value MV 
      INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
      INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
WHERE (metadata_name = "height" and value = "1080") 
   OR (metadata_name = "file extension" and value = "mpeg") 

AND版本:

SELECT file_id 
  FROM metadata_value MV 
   INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
   INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
   WHERE (metadata_name = "height" and value = "1080") 
      OR (metadata_name = "file extension" and value = "mpeg") 
  group by file_id having count(1)>1

关于AND版本的2个简短说明:

这实际上是一种根据之前的ORing来定义交叉点的方法..

当ANDind你有3种可能性时:

  • 没有满足所请求的条件(在ORing中它不会出现)
  • 只有其中一个满意(在ORing中它会出现一次)
  • 两者都满意(在ORing中它会出现两次,如果没有指定distinct)。

所以我刚删除了distinct子句,放了一个分组,并选择了两次出现的记录。

或者继续使用exists子句:)


编辑以下评论:

好的,试着保持简单...... id_block_values满足以下两个条件之一:

SELECT BLOCK_VALUE_ID
   FROM METADATA_VALUE MV
     INNER JOIN 
        METADATA M
     ON MV.META_ID=M.METADATA_ID
  WHERE (METADATA_NAME='height' AND VALUE='1080')
     OR (METADATA_NAME='file extension' AND VALUE='mpeg')

如果此处有2条以上的记录,则表示存在问题(重复元数据)。

现在是ANDing

SELECT FILE_ID
  FROM BLOCK_VALUE BV
    INNER JOIN   
      (   SELECT BLOCK_VALUE_ID
            FROM METADATA_VALUE MV
            INNER JOIN 
                 METADATA M
              ON MV.META_ID=M.METADATA_ID
           WHERE (METADATA_NAME='height' AND VALUE='1080')
              OR (METADATA_NAME='file extension' AND VALUE='mpeg')
      ) X
  ON BV.ID_BLOCK_VALUE=X.BLOCK_VALUE_ID
 GROUP BY FILE_ID HAVING COUNT(1)>1

仍然,我无法理解为什么以前的查询不起作用.. 我担心如果你也删除了或查询中的DIstinct子句,你会看到一些记录超过两次,这没有意义。 顺便说一下,可以请你告诉我这些表的主键是什么?

答案 2 :(得分:1)

我正在开设一个新帖子,只是为了保持“正确”的解决方案整洁..

好的,对不起,我似乎在做出错误的假设。我从未想过两个块的定义完全相同。

所以,既然我是一个模仿者,我喜欢从OR解决方案(:P)获得AND,我得到了这两个解决方案..

ORing:我更喜欢Chris的解决方案......

SELECT DISTINCT file_id 
  FROM metadata_value MV 
    INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
    INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
   WHERE (metadata_name = "height" and value = "1080") 
      OR (metadata_name = "file extension" and value = "mpeg")

ANDing:我将使用您的ORing版本(UNION版本

  SELECT FILE_ID FROM (
     SELECT DISTINCT 1, file_id 
             FROM metadata_value MV 
       INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
       INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
              WHERE (metadata_name = "height" and value = "1080")
     UNION ALL
     SELECT DISTINCT 2, file_id 
             FROM metadata_value MV 
       INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
       INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
              WHERE (metadata_name = "file extension" and value = "mpeg")
   ) IHATEAND
   GROUP BY FILE_ID
   HAVING COUNT(1)>1

给出了:

+---------+
| FILE_ID |
+---------+
|    6962 |
+---------+
1 row in set (0.24 sec)
它应该比看到你粘贴和挖掘的性能的ORing快一点(我慢3倍,升级的时间-.-),但仍然比以前的查询快得多;)

无论如何,ANDing如何工作? 简单地说,它只是执行两个单独的查询并根据它们来自的分支命名记录,然后计算来自它们的不同文件ID

更新:另一种方法,无需“命名”分支:

SELECT FILE_ID FROM (
    SELECT file_id 
        FROM metadata_value MV 
        INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
        INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
            WHERE (metadata_name = "height" and value = "1080")
    GROUP BY FILE_ID
    UNION ALL
    SELECT file_id 
        FROM metadata_value MV 
        INNER JOIN metadata M ON MV.meta_id = M.id_metadata 
        INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value 
    WHERE (metadata_name = "file extension" and value = "mpeg")
    GROUP BY FILE_ID
    ) IHATEAND
GROUP BY FILE_ID
HAVING COUNT(1)>1

这里的结果是相同的(和性能)我正在利用这样一个事实:虽然UNION会自动对重复项进行排序并删除重复项,但UNION ALL却没有...这是完美的,因为我不想要删除它们(并且通常联合所有也比联盟更快:)),这样我就可以忘记命名了。