我有两个数据集paper_data和paper_author
paper_author:
paper_id author_id
1 521630
1 972575
1 1528710
2 521630
2 1682088
3 1682088
paper_data:
paper_id paper_year
1 2009
2 2007
3 1963
4 2005
5 1997
我想找到作者撰写论文的不同年份,例如:
author_id paper_id paper_year distinct_paper_year_count
521630 1,2 2009,2007 2
972575 1 2009 1
1528710 1 2009 1
1682088 2,3 2007,1963 2
所以我希望最终结果为:
author_id distinct_paper_year_count
521630 2
972575 1
1528710 1
1682088 2
我能够:
author_id paper_year
521630 2009
972575 2009
.....
运行一个简单的查询:
statement<-"select paper_author.author_id,paper_data.paper_year
from paper_author,paper_data
where paper_author.paper_id=paper_data.paper_id"
但后来我被困住了。怎么可以这样做?
由于
答案 0 :(得分:1)
这应该这样做:
select paper_author.author_id,
count(distinct paper_data.paper_year) as distinct_paper_year_count
from paper_author
join paper_data on paper_author.paper_id = paper_data.paper_id
group by paper_author.author_id
请注意,我使用明确的JOIN
条件替换了where子句中的过时隐式连接,该条件优先于隐式连接。
答案 1 :(得分:0)
假设您的左表是paper_author,您需要使用paper_data表进行左连接以获得预期结果。此外,您应该使用'select'查询和'count'函数,使用'distinct'关键字来获取paper_year的不同计数。最后,您必须使用'group by'子句根据paper_author中的author_id对结果进行分组。
所以这里是查询:
select pa.author_id, count(distinct pd.paper_year)
from dbo.paper_author as pa
join dbo.paper_data as pd
on pa.paper_id = pd.paper_id
group by pa.author_id
您可以查看以下sqlfiddle链接以验证结果:http://sqlfiddle.com/#!3/e5d6e/1