我有一张表格如下:
paper_id author_id author_name author_affiliation
1 521630 Ayman Kaheel Cairo Microsoft Innovation Lab
1 972575 Mahmoud Refaat Cairo Microsoft Innovation Lab
3 1528710 Ahmed Abdul-hamid Harvard
现在,我发现了多对author_id
,author_name
和author_affiliation
。例如:
author_id author_name author_affiliation count
1 Masuo Fukui <NA> 4
4 Yasusada Yamada <NA> 8
我使用以下查询:
statement<-"select author_id,author_name,author_affiliation,count(*)
from paper_author
GROUP BY author_id,author_name,author_affiliation
HAVING (COUNT(*)>1)"
现在我想知道这里有多少author_ids。我这样做:
statement<-"select distinct author_id
from paper_author
where author_id in (
select author_id,author_name,author_affiliation,count(*)
from paper_author
GROUP BY author_id,author_name,author_affiliation
HAVING (COUNT(*)>1)
)"
我无法获得理想的结果。
另外,如何获得上述结果中的纸张ID数量?
感谢。
答案 0 :(得分:1)
我会这样做,我想:
statement<-"select distinct author_id
from paper_author
where author_id in (
select author_id
from paper_author
GROUP BY author_id,author_name,author_affiliation
HAVING (COUNT(*)>1)
)"
答案 1 :(得分:0)
如果您只想知道有多少作者有多篇论文,请使用此查询:
SELECT COUNT(*)
FROM (SELECT author_id, author_affiliation, COUNT(*)
FROM paper_author
GROUP BY author_id, author_affiliation
HAVING COUNT(*) > 1);
这假设author_id
是author_name
的唯一标识符。如果id选择author_name, author_affiliation
组合(即为不同机构制作论文的作者有多个ID,每个联盟一个),那么您也可以从子查询中点击author_affiliation
。
答案 2 :(得分:0)
这是您稍微重写的查询。您不需要IN子句。您可以直接从结果集中选择。
select distinct author_id
from
(
select author_id
from paper_author
group by author_id,author_name,author_affiliation
having count(*) > 1
);