我有一个
的数据库books (primary key: bookID)
characterNames (foreign key: books.bookID)
locations (foreign key: books.bookID)
字符名称和位置的文本位置保存在相应的表格中 我正在使用psycopg2编写一个Pythonscript,查找书中给定角色名称和位置的所有出现。我只希望书中出现,其中包括角色名称和位置 Here我已经找到了搜索一个位置和一个角色的解决方案:
WITH b AS (
SELECT bookid
FROM characternames
WHERE name = 'XXX'
GROUP BY 1
INTERSECT
SELECT bookid
FROM locations
WHERE l.locname = 'YYY'
GROUP BY 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
CTE'b'包含所有bookid,其中出现字符名称“XXX”和位置“YYY”。
现在我还想知道搜索2个地方和名字(或分别是2个名字和地点)。如果所有搜索过的实体必须出现在一本书中,那很简单,但是这样做:
正在寻找:Tim,Al,Toolshop
结果:书籍包括
(Tim,Al,Toolshop)或
(Tim,Al)或
(Tim,Toolshop)或
(Al,Toolshop)
问题可能会重复4,5,6 ......条件 关于交叉更多子查询,我认为这不起作用 取而代之的是UNION找到的bookIDs,GROUP它们并选择bookid发生一次以上:
WITH b AS (
SELECT bookid, count(bookid) AS occurrences
FROM
(SELECT DISTINCT bookid
FROM characterNames
WHERE name='XXX'
UNION
SELECT DISTINCT bookid
FROM characterNames
WHERE name='YYY'
UNION
SELECT DISTINCT bookid
FROM locations
WHERE locname='ZZZ'
GROUP BY bookid)
WHERE occurrences>1)
我觉得这个有效,目前无法测试,但这是最好的方法吗?
答案 0 :(得分:4)
对广义情况使用计数的想法是合理的。但是,对语法进行了几处调整:
WITH b AS (
SELECT bookid
FROM (
SELECT DISTINCT bookid
FROM characterNames
WHERE name='XXX'
UNION ALL
SELECT DISTINCT bookid
FROM characterNames
WHERE name='YYY'
UNION ALL
SELECT DISTINCT bookid
FROM locations
WHERE locname='ZZZ'
) x
GROUP BY bookid
HAVING count(*) > 1
)
SELECT bookid, position, 'char' AS what
FROM b
JOIN characternames USING (bookid)
WHERE name = 'XXX'
UNION ALL
SELECT bookid, position, 'loc' AS what
FROM b
JOIN locations USING (bookid)
WHERE locname = 'YYY'
ORDER BY bookid, position;
使用UNION ALL
(不是UNION
)来保留子查询之间的重复项。在这种情况下,您希望它们能够计算它们。
子查询应该产生不同的值。它与DISTINCT
的方式一起使用。您可能需要尝试GROUP BY 1
,看看它是否表现更好(我不指望它。)
GROUP BY
可以超出子查询。它只会应用于最后一个子查询,因为您已经DISTINCT bookid
已经没有任何意义。
检查书上是否有多个点击必须进入HAVING
条款:
HAVING count(*) > 1
您不能在WHERE
子句中使用汇总值。
您不能简单地在一个表上组合多个条件。你如何计算研究结果的数量?但是有一种更复杂的方式。可能会或可能不会提高性能,您必须进行测试(使用EXPLAIN ANALYZE
)。两个查询都需要对表characterNames
进行至少两次索引扫描。至少它缩短了语法。
考虑我如何计算characterNames
的点击次数以及我如何在外部sum(hits)
中更改为SELECT
:
WITH b AS (
SELECT bookid
FROM (
SELECT bookid
, max((name='XXX')::int)
+ max((name='YYY')::int) AS hits
FROM characterNames
WHERE (name='XXX'
OR name='YYY')
GROUP BY bookid
UNION ALL
SELECT DISTINCT bookid, 1 AS hits
FROM locations
WHERE locname='ZZZ'
) x
GROUP BY bookid
HAVING sum(hits) > 1
)
...
将boolean
转换为integer
会为0
提供FALSE
,为1
提供TRUE
。这有帮助。
当我骑自行车到我的公司时,这个东西一直在我脑后踢。我有理由相信这个查询可能会更快。请试一试:
WITH b AS (
SELECT bookid
, (EXISTS (
SELECT *
FROM characterNames c
WHERE c.bookid = b.bookid
AND c.name = 'XXX'))::int
+ (EXISTS (
SELECT *
FROM characterNames c
WHERE c.bookid = b.bookid
AND c.name = 'YYY'))::int AS c_hits
, (EXISTS (
SELECT *
FROM locations l
WHERE l.bookid = b.bookid
AND l.locname='ZZZ'))::int AS l_hits
FROM books b
WHERE (c_hits + l_hits) > 1
)
SELECT c.bookid, c.position, 'char' AS what
FROM b
JOIN characternames c USING (bookid)
WHERE b.c_hits > 0
AND c.name IN ('XXX', 'YYY')
UNION ALL
SELECT l.bookid, l.position, 'loc' AS what
FROM b
JOIN locations l USING (bookid)
WHERE b.l_hits > 0
AND l.locname = 'YYY'
ORDER BY 1,2,3;
EXISTS
半联接可以在第一场比赛时停止执行。由于我们只对CTE中的全有或全无的答案感兴趣,因此可以更快地完成
这样我们也不需要聚合(不需要GROUP BY
)。
我还记得是否找到了任何字符或位置,并且只重新访问了实际匹配的表格。