我有一个由publication_id和作者姓名组成的表
我想找到每个作者的所有共同作者,即谁都在一起工作。
我能够获得每位作者所获得的所有出版物
pubsperauthor <- sample_pubs_small %>%
group_by(cname) %>%
summarise(pubs = toString(sort(unique(publication_id))))
现在我想获得该酒吧的所有共同作者的名字。有什么建议吗?
以下是数据的代码
> dput(pubsperauthor)
structure(list(cname = c("AMEY S BAILEY", "JACK SMITH", "JACK A SMITH",
"JACK B SMITH", "JAMES ANDERSON", "JAMES M ANDERSON", "RONALD VALE",
"RONALD A VALE", "RONALD DAVID VALE"), pubs = c("6", "6", "4",
"5", "1, 2", "4, 5, 6", "3", "0", "1, 2")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -9L), .Names = c("cname",
"pubs"))
> dput(sample_pubs_small)
structure(list(publication_id = c(0L, 1L, 1L, 2L, 2L, 3L, 4L,
4L, 5L, 5L, 6L, 6L, 6L), cname = c("RONALD A VALE", "JAMES ANDERSON",
"RONALD DAVID VALE", "JAMES ANDERSON", "RONALD DAVID VALE",
"RONALD VALE", "JAMES M ANDERSON", "JACK A SMITH", "JAMES M ANDERSON",
"JACK B SMITH", "JAMES M ANDERSON", "AMEY S BAILEY", "JACK SMITH"
)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"
), .Names = c("publication_id", "cname"))
修改
以下是示例输出
1 AMEY S BAILEY JACK SMITH, JAMES M ANDERSON
2 JACK SMITH AMEY S BAILEY, JAMES M ANDERSON
3 JACK A SMITH JAMES M ANDERSON
4 JACK B SMITH JAMES M ANDERSON
5 JAMES ANDERSON RONALD DAVID VALE
6 JAMES M ANDERSON AMEY S BAILEY, JACK SMITH, JACK A SMITH, JACK B SMITH
7 RONALD DAVID VALE JAMES ANDERSON
8 RONALD A VALE
9 RONALD VALE
答案 0 :(得分:3)
这是获取每位作者的共同作者列表的一种方法。请注意,它删除了没有共同作者的作者。因此,根据您所需的最终数据结构,您可能希望使用完整的作者列表进行另一次联接。
coauthor <- sample_pubs_small %>%
left_join(sample_pubs_small, by = "publication_id") %>%
subset(cname.x != cname.y) %>%
group_by(cname.x) %>%
summarise(Coauthors = toString(sort(unique(cname.y))))
答案 1 :(得分:1)
以下是如何让作者与dplyr
没有合作者。
library(dplyr)
sample_pubs_small%>%
left_join(sample_pubs_small, by="publication_id") %>%
mutate(cname.y=ifelse(cname.x==cname.y,NA,cname.y)) %>%
group_by(cname.x)%>%
summarise(coauthors = toString(sort(unique(cname.y))))
cname.x coauthors
<chr> <chr>
1 AMEY S BAILEY JACK SMITH, JAMES M ANDERSON
2 JACK SMITH AMEY S BAILEY, JAMES M ANDERSON
3 JACK A SMITH JAMES M ANDERSON
4 JACK B SMITH JAMES M ANDERSON
5 JAMES ANDERSON RONALD DAVID VALE
6 JAMES M ANDERSON AMEY S BAILEY, JACK SMITH, JACK A SMITH, JACK B SMITH
7 RONALD VALE
8 RONALD A VALE
9 RONALD DAVID VALE JAMES ANDERSON
答案 2 :(得分:1)
Ian Wesley已经给出了答案,但我会添加一些有用的东西。
您可以使用aggregate
获取pubsperauthor
:
pubsperauthor <- aggregate(publication_id ~ cname, sample_pubs_small, c)
你可以做同样的事情来获得authorsperpub
,这将给每个出版物的所有作者(在某种程度上是共同作者)
authorsperpub <- aggregate(cname ~ publication_id, sample_pubs_small, c)