使用dplyr对两个变量进行分组并找到前三个变量

时间:2019-03-26 15:42:28

标签: r dplyr

我以为我已经弄清楚了,但是我一直盘旋着-寻找一些指针。我对top_n的理解是,它基本上是结合了arrangeslice,因此,如果我正在寻找数据框中的前n个项目,它将为我返回这些项目。但是我没有得到我期望的结果。

我正在使用的数据是发布信息-整个数据集由每个作者的行组成,这些行分别与期刊文章标题(pubtitle)和期刊(journal_title)和该期刊的出版商(publisher)。

我制作了整个数据框的一个子集,该子集的计数是:1)期刊出版商的数量(因此,publisher被计入pub_publisher_count下),以及2)出现在特定数据库下的文章数日记标题(因此,pubtitle被计入pub_title_count中)。这是该新数据框的示例:

structure(list(publisher = c("Journal of Engineering Education", 
"Sage Publications, Inc.", "The Johns Hopkins University Press", 
"University of Ottawa", "American Society of Engineering Education", 
"Sage Publications, Inc.", "American Society of Engineering Education", 
"Sage Publications, Inc.", "Frontiers Research Foundation", "Public Library of Science", 
"Routledge", "American Society of Engineering Education", "American Society of Engineering Education", 
"Wiley-Blackwell Publishing, Inc.", "American Association for Agricultural Education", 
"American Psychological Association", "John Wiley & Sons Inc.", 
"Institute of Electrical and Electronics Engineers", "Springer New York LLC", 
"Sage Publications, Inc.", "Oxford University Press", "Sage Publications, Inc.", 
"American Society for Cell Biology", "Frontiers Research Foundation", 
"Routledge", "American Psychological Association", "Sage Publications, Inc.", 
"Routledge", "Elsevier Inc.", "Psychology Press"), pubtitle = c("Journal of Counseling and Development", 
"Psychology of Women Quarterly", "Journal of College Student Development", 
"University of Ottawa Journal of Medicine", "ASEE Annual Conference and Exposition, Conference Proceedings", 
"Personality and Social Psychology Bulletin", "ASEE Annual Conference and Exposition, Conference Proceedings", 
"Journal of Career Development", "Frontiers in Psychology", "PLoS ONE", 
"Journal of Educational Research", "ASEE Annual Conference and Exposition, Conference Proceedings", 
"ASEE Annual Conference and Exposition, Conference Proceedings", 
"Centaurus", "Journal of Agricultural Education", "Cultural diversity & ethnic minority psychology", 
"Science Education", "2007 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports", 
"Research in Higher Education", "Personality and Social Psychology Bulletin", 
"Quarterly Journal of Economics", "Science Technology and Human Values", 
"Cell Biology Education", "Frontiers in Psychology", "International Journal of Science Education", 
"Journal of Applied Psychology", "Gender & Society", "NASPA Journal About Women in Higher Education", 
"Journal of vocational behavior", "Applied Developmental Science"
), pub_publisher_count = c(3L, 77L, 4L, 1L, 35L, 77L, 35L, 77L, 
20L, 13L, 51L, 35L, 35L, 27L, 4L, 25L, 15L, 12L, 76L, 77L, 13L, 
77L, 6L, 20L, 51L, 25L, 77L, 51L, 64L, 3L), pub_title_count = c(1L, 
13L, 2L, 1L, 32L, 5L, 32L, 4L, 14L, 10L, 1L, 32L, 32L, 1L, 4L, 
2L, 7L, 2L, 13L, 5L, 1L, 1L, 1L, 14L, 4L, 1L, 3L, 6L, 10L, 1L
)), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
), .Names = c("publisher", "pubtitle", "pub_publisher_count", 
"pub_title_count"))

我正在寻找每个期刊出版商的前三名。因此,对于每个发布者,应按名称将它们分组在一起,然后在其中将前三个标题也分组。换句话说,我会看到类似的内容:

publisher        pubtitle
Sage             sage_title1
Sage             sage_title2
Sage             sage_title3
Springer         springer_title1
Springer         springer_title2
Springer         springer_title3
Publisher3       publisher3_title1
Publisher3       publisher3_title2
Publisher4       publisher4_title1
Publisher4       publisher4_title2
Publisher4       publisher4_title3

我以为我可以使用下面的代码到达那里,但是我似乎被困住了:

pub_ownership %>% 
    mutate(rn = row_number()) %>% 
    group_by(publisher) %>% 
    top_n(3, pub_title_count) %>% View()

似乎要返回一个子集,但是就优先排序期刊标题计数(pub_title_count)而言,这是不正确的,也没有将期刊标题分组在一起(因此,我得到了很多重复的信息)。

感谢您的光临!

编辑:这是数据的另一次随机采样,其中更多的重复信息可能更接近实际数据集的工作方式。

structure(list(publisher = c("American Society of Engineering Education", 
"Elsevier Inc.", "Routledge", "Sage Publications, Inc.", "Elsevier Inc.", 
"Springer New York LLC", "Wiley-Blackwell Publishing, Inc.", 
"Sage Publications, Inc.", "American Society of Engineering Education", 
"M D P I AG", "American Psychological Association", "Journal of Engineering Education", 
"Elsevier Inc.", "Springer New York LLC", "Sage Publications, Inc.", 
"Sage Publications, Inc.", "Springer New York LLC", "Springer New York LLC", 
"Taylor & Francis Inc.", "American Psychological Association", 
"Public Library of Science", "John Wiley & Sons, Inc.", "Springer New York LLC", 
"Springer New York LLC", "Springer New York LLC", "Public Library of Science", 
"Elsevier Inc.", "Scientia Socialis", "Public Library of Science", 
"Addleton Academic Publishers", "American Psychological Association", 
"Elsevier Inc.", "American Society for Horticultural Science", 
"Routledge", "Springer New York LLC", "A I P Publishing LLC", 
"Elsevier Inc.", "Palgrave Macmillan Ltd.", "Institute for S T E M Education and Research", 
"Public Library of Science", "Routledge", "Routledge", "Learning Disabilities Worldwide", 
"EBSCO Publishing", "M D P I AG", "Sage Publications, Inc.", 
"Springer New York LLC", "Elsevier Inc.", "American Psychological Association", 
"Oxford University Press"), pubtitle = c("ASEE Annual Conference and Exposition, Conference Proceedings", 
"Research Policy", "Interdisciplinary Science Reviews", "Journal of Career Development", 
"Developmental Review", "Research in Higher Education", "Genome biology", 
"Review of Research in Education", "ASEE Annual Conference and Exposition, Conference Proceedings", 
"Social Sciences", "Journal of educational psychology", "Journal of Engineering Education", 
"Journal of vocational behavior", "Sex Roles", "Psychology of Women Quarterly", 
"Gifted Child Quarterly", "Social Psychology of Education", "Sex Roles", 
"Economic Geography", "Journal of Counseling Psychology", "Plos Biology", 
"Women in Higher Education", "Sex Roles", "Social Psychology of Education", 
"Research in Higher Education", "PLoS ONE", "Social Science Journal", 
"Journal of Baltic Science Education", "PLoS ONE", "Journal of Research in Gender Studies", 
"Journal of educational psychology", "Role of Gender in Educational Contexts and Outcomes", 
"Horttechnology", "International Journal of Science Education", 
"Research in Higher Education", "AIP Conference Proceedings", 
"Journal of vocational behavior", "Latino Studies", "Journal of STEM Education : Innovations and Research", 
"PLoS ONE", "Computer Science Education", "Chinese Sociological Review", 
"Insights on Learning Disabilities", "Teachers College Record", 
"Social Sciences", "Journal of Career Development", "Research in Higher Education", 
"Social science research", "Journal of Counseling Psychology", 
"Bioscience"), pub_publisher_count = c(35L, 64L, 51L, 77L, 64L, 
76L, 27L, 77L, 35L, 17L, 25L, 3L, 64L, 76L, 77L, 77L, 76L, 76L, 
4L, 25L, 13L, 7L, 76L, 76L, 76L, 13L, 64L, 1L, 13L, 2L, 25L, 
64L, 1L, 51L, 76L, 1L, 64L, 2L, 3L, 13L, 51L, 51L, 1L, 3L, 17L, 
77L, 76L, 64L, 25L, 13L), pub_title_count = c(32L, 2L, 1L, 4L, 
1L, 13L, 1L, 1L, 32L, 16L, 5L, 2L, 10L, 21L, 13L, 1L, 5L, 21L, 
1L, 3L, 1L, 3L, 21L, 5L, 13L, 10L, 2L, 1L, 10L, 2L, 5L, 1L, 1L, 
4L, 13L, 1L, 10L, 1L, 3L, 10L, 2L, 1L, 1L, 3L, 16L, 4L, 13L, 
4L, 3L, 2L)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"), .Names = c("publisher", "pubtitle", "pub_publisher_count", 
"pub_title_count"))

0 个答案:

没有答案