有一张表,该表包含97972561行(记录)和8列(属性)。格式如下:
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| PREDICATE_ID | PMID | SENTENCE_ID | SUBJECT_ID | SUBJECT_NAME | PREDICATE | OBJECT_ID | OBJECT_NAME |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
我想过滤仅主题,谓语和客体值仅出现一次的记录。例如,一个表中有四个记录。由于(Bob,is_a,Person)仅出现一次,因此应该从结果中排除最后一个记录。
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| PREDICATE_ID | PMID | SENTENCE_ID | SUBJECT_ID | SUBJECT_NAME | PREDICATE | OBJECT_ID | OBJECT_NAME |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1 | 100 | 1 | 2 | Bob | is_born_in| 3 | 1994 |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1 | 103 | 3 | 2 | Bob | is_born_in| 3 | 1994 |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 1 | 102 | 5 | 2 | Bob | is_born_in| 3 | 1994 |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
| 2 | 104 | 2 | 2 | Bob | is_a | 4 | Person |
+--------------+------+-------------+------------+--------------+-----------+-----------+-------------+
任何帮助将不胜感激!
答案 0 :(得分:1)
使用聚合,我们可以尝试:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT SUBJECT_ID, PREDICATE_ID, OBJECT_ID
FROM yourTable
GROUP BY SUBJECT_ID, PREDICATE_ID, OBJECT_ID
HAVING COUNT(*) > 1
) t2
ON t1.SUBJECT_ID = t2.SUBJECT_ID AND
t1.PREDICATE_ID = t2.PREDICATE_ID AND
t1.OBJECT_ID = t2.OBJECT_ID;
如果您使用的是MySQL 8+,我们可以利用分析函数来生成更简洁的查询:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY BY SUBJECT_ID, PREDICATE_ID, OBJECT_ID) cnt
FROM yourTable
)
SELECT *
FROM cte
WHERE cnt > 1;