我有以下数据库:
paperid | authorid | name
---------+----------+---------------
1889374 | 897449 | D. N. Page
1889374 | 1795881 | C. N. Pope
1889374 | 1952069 | S. W. Hawking
我想创建一个包含以下列的表:
结果应如下所示:
paperid | author | coauthors
---------+---------------+---------------------------
1889374 | D. N. Page | C. N. Pope S. W. Hawking
1889374 | C. N. Pope | D. N. Page S. W. Hawking
1889374 | S. W. Hawking | D. N. Page C. N. Pope
这是通过以下查询实现的:
SELECT foo.paperid, npa.name as author, foo.coauthors
INTO npatest
FROM newpaperauthor npa
CROSS JOIN (
SELECT paperid, string_agg(name, ' ') as coauthors
FROM newpaperauthor
GROUP BY paperid
ORDER BY paperid) foo;
UPDATE npatest SET coauthors = regexp_replace(coauthors, author, '');
SELECT * FROM npatest;
当数据库中有更多paperid
时,问题就出现了:
paperid | authorid | name | affiliation
---------+----------+------------------+------------------------
1889373 | 122817 | Kazuhiro Hongo |
1889373 | 1091191 | Hiroshi NAKAGAWA |
1889373 | 1874415 | Hiroshi Nakagawa | University of Oklahoma
1889373 | 2149773 | Han Soo Chang |
1889374 | 897449 | D. N. Page |
1889374 | 1795881 | C. N. Pope |
1889374 | 1952069 | S. W. Hawking |
然后我会得到他们的笛卡尔产品,如:
paperid | author | coauthors
---------+------------------+----------------------------------------------------------------
1889373 | Kazuhiro Hongo | Hiroshi NAKAGAWA Hiroshi Nakagawa Han Soo Chang
1889374 | Kazuhiro Hongo | D. N. Page C. N. Pope S. W. Hawking
1889373 | Hiroshi NAKAGAWA | Kazuhiro Hongo Hiroshi Nakagawa Han Soo Chang
1889374 | Hiroshi NAKAGAWA | D. N. Page C. N. Pope S. W. Hawking
1889373 | Hiroshi Nakagawa | Kazuhiro Hongo Hiroshi NAKAGAWA Han Soo Chang
1889374 | Hiroshi Nakagawa | D. N. Page C. N. Pope S. W. Hawking
1889373 | Han Soo Chang | Kazuhiro Hongo Hiroshi NAKAGAWA Hiroshi Nakagawa
1889374 | Han Soo Chang | D. N. Page C. N. Pope S. W. Hawking
1889373 | D. N. Page | Kazuhiro Hongo Hiroshi NAKAGAWA Hiroshi Nakagawa Han Soo Chang
1889374 | D. N. Page | C. N. Pope S. W. Hawking
1889373 | C. N. Pope | Kazuhiro Hongo Hiroshi NAKAGAWA Hiroshi Nakagawa Han Soo Chang
1889374 | C. N. Pope | D. N. Page S. W. Hawking
1889373 | S. W. Hawking | Kazuhiro Hongo Hiroshi NAKAGAWA Hiroshi Nakagawa Han Soo Chang
1889374 | S. W. Hawking | D. N. Page C. N. Pope
如何摆脱那里的笛卡尔产品?
答案 0 :(得分:3)
以下是解决此问题的方法:
将所有共同作者的列表生成为子查询。生成所有作者的列表。然后将它们连接在一起并进行字符串操作以获得所需的内容。
作者很容易:
select paperid, npa.name as author
from newpaperauthor npa;
共同作者很容易:
select paperid, string_agg(npa.name, ' ') as coauthors
from newpaperauthor npa
group by paperid;
组合需要一些列表替换:
select a.paperid, a.author,
replace(replace(coauthors, author, ''), ' ', ' ') as coauthors
from (select paperid, npa.name as author
from newpaperauthor npa
) a join
(select paperid, string_agg(npa.name, ' ') as coauthors
from newpaperauthor npa
group by paperid
) ca
on a.paperid = ca.paperid;
答案 1 :(得分:2)
这可以非常简单 array_agg()
,因为窗口聚合函数与array_remove()
结合使用(第9.3页引入):
CREATE TABLE npatest AS
SELECT paperid, name AS author
, array_to_string(array_remove(array_agg(name) OVER (PARTITION BY paperid), name), ', ') AS coauthors
FROM newpaperauthor n;
如果作者姓名不是唯一的,则会出现并发症 然后,如果作者姓名不是唯一的,那么整个操作都是有缺陷的。
使用array_agg()
和array_remove()
代替string_agg()
和regexp_replace()
,因为后者会因“Jon Fox”和“Jon Foxy”等类似名称而轻易失败,分隔符也很混乱。
array_to_string()
将数组转换为字符串。我使用', '
作为分隔符,这对我来说似乎比空间更明智。
不鼓励使用SELECT INTO
。请改用上级CREATE TABLE AS
。 Per documentation:
CREATE TABLE AS
是推荐的语法,因为这种形式SELECT INTO
在ECPG或PL / pgSQL中不可用,因为它们 不同地解释INTO
子句。此外,CREATE TABLE AS
提供SELECT INTO
提供的功能的超集。
答案 2 :(得分:0)
@GordonLinoff的查询可以通过压缩聚合中的第一作者来简化:
SELECT DISTINCT
p0.paperid , p0.authorid , p0.name as name1
, string_agg(p1.name, ', ' ) AS others
FROM papers p0
JOIN papers p1 ON p1.paperid = p0.paperid AND p1.authorid <> p0.authorid
GROUP BY p0.paperid, p0.authorid, p0.name
ORDER BY p0.paperid, p0.authorid
;