我在R中的csv文件中读过其他信息中的共同作者数据。该文件的作者列包含如下共同作者信息:
Customer
我想将此信息转换为边缘列表,其格式如下:
SELECT * FROM Customer
WHERE Customer_Id = Review_Customer_Id IN (SELECT Review_Customer_Id FROM Review);
基本上,网络是无向图。任何帮助/入门代码将不胜感激。另外,有没有办法保持合作的次数/频率(即Saha在示例中两次与Chakraborty一起发布)?
到目前为止我的代码:
Miyazaki T., Akisawa A., Saha B.B., El-Sharkawy I.I., Chakraborty A.
Saha B.B., Chakraborty A., Koyama S., Aristov Y.I.
Ali S.M., Chakraborty A.
...
答案 0 :(得分:0)
鉴于您的输入数据(我的示例中为dat
)的遗漏值NA
小于每篇文章作者的最大值,您可以使用以下R
- 代码:
# data
dat <- rbind(c("Miyazaki T.", "Akisawa A.", "Saha B.B.", "El-Sharkawy I.I.", "Chakraborty A."),
c("Saha B.B.", "Chakraborty A.", "Koyama S.", "Aristov Y.I.", NA),
c("Ali S.M.", "Chakraborty A.", NA, NA, NA))
# loop through all rows of dat (all papers, I presume)
transformed.dat <- lapply(1:nrow(dat), function(row.num) {
row.el <- dat[row.num, ] # the row element that will be used in this loop
# number of authors per paper
n.authors <- length(row.el[!is.na(row.el)])
# creates a matrix with all possible combinations (play around with n.authors, to see what it does)
pairings <- combn(n.authors, 2)
# loop through all pairs and return a vector with one row and two columns
res <- apply(pairings, 2, function(vec) {
return(t(row.el[vec]))
})
# create a data.frame with names aut1 and aut2
res <- data.frame(aut1 = res[1, ],
aut2 = res[2, ])
return(res)
})
# use data.table's rbindlist to bind the list of combinations together
final.dat <- data.table::rbindlist(transformed.dat)
final.dat
# aut1 aut2
# 1: Miyazaki T. Akisawa A.
# 2: Miyazaki T. Saha B.B.
# 3: Miyazaki T. El-Sharkawy I.I.
# 4: Miyazaki T. Chakraborty A.
# 5: Akisawa A. Saha B.B.
# 6: Akisawa A. El-Sharkawy I.I.
# 7: Akisawa A. Chakraborty A.
# 8: Saha B.B. El-Sharkawy I.I.
# 9: Saha B.B. Chakraborty A.
# 10: El-Sharkawy I.I. Chakraborty A.
# 11: Saha B.B. Chakraborty A.
# 12: Saha B.B. Koyama S.
# 13: Saha B.B. Aristov Y.I.
# 14: Chakraborty A. Koyama S.
# 15: Chakraborty A. Aristov Y.I.
# 16: Koyama S. Aristov Y.I.
# 17: Ali S.M. Chakraborty A.
这能满足你的问题吗?
关键是combn
- 创建可能组合的函数