从共同作者数据创建边缘列表

时间:2015-11-05 09:12:03

标签: python r graph igraph sna

我在R中的csv文件中读过其他信息中的共同作者数据。该文件的作者列包含如下共同作者信息:

Customer

我想将此信息转换为边缘列表,其格式如下:

SELECT * FROM Customer
WHERE Customer_Id = Review_Customer_Id IN (SELECT Review_Customer_Id FROM Review);

基本上,网络是无向图。任何帮助/入门代码将不胜感激。另外,有没有办法保持合作的次数/频率(即Saha在示例中两次与Chakraborty一起发布)?

到目前为止我的代码:

Miyazaki T., Akisawa A., Saha B.B., El-Sharkawy I.I., Chakraborty A.
Saha B.B., Chakraborty A., Koyama S., Aristov Y.I.
Ali S.M., Chakraborty A.
...

1 个答案:

答案 0 :(得分:0)

鉴于您的输入数据(我的示例中为dat)的遗漏值NA小于每篇文章作者的最大值,您可以使用以下R - 代码:

# data 
dat <- rbind(c("Miyazaki T.", "Akisawa A.", "Saha B.B.", "El-Sharkawy I.I.", "Chakraborty A."),
             c("Saha B.B.", "Chakraborty A.", "Koyama S.", "Aristov Y.I.", NA),
             c("Ali S.M.", "Chakraborty A.", NA, NA, NA))

# loop through all rows of dat (all papers, I presume)
transformed.dat <- lapply(1:nrow(dat), function(row.num) {

  row.el <- dat[row.num, ] # the row element that will be used in this loop

  # number of authors per paper
  n.authors <- length(row.el[!is.na(row.el)])

  # creates a matrix with all possible combinations (play around with n.authors, to see what it does)
  pairings <- combn(n.authors, 2)

 # loop through all pairs and return a vector with one row and two columns
  res <- apply(pairings, 2, function(vec) {
    return(t(row.el[vec]))
  })

  # create a data.frame with names aut1 and aut2
  res <- data.frame(aut1 = res[1, ],
                    aut2 = res[2, ])

  return(res)
})

# use data.table's rbindlist to bind the list of combinations together
final.dat <- data.table::rbindlist(transformed.dat)

final.dat
#         aut1             aut2
# 1:      Miyazaki T.       Akisawa A.
# 2:      Miyazaki T.        Saha B.B.
# 3:      Miyazaki T. El-Sharkawy I.I.
# 4:      Miyazaki T.   Chakraborty A.
# 5:       Akisawa A.        Saha B.B.
# 6:       Akisawa A. El-Sharkawy I.I.
# 7:       Akisawa A.   Chakraborty A.
# 8:        Saha B.B. El-Sharkawy I.I.
# 9:        Saha B.B.   Chakraborty A.
# 10: El-Sharkawy I.I.   Chakraborty A.
# 11:        Saha B.B.   Chakraborty A.
# 12:        Saha B.B.        Koyama S.
# 13:        Saha B.B.     Aristov Y.I.
# 14:   Chakraborty A.        Koyama S.
# 15:   Chakraborty A.     Aristov Y.I.
# 16:        Koyama S.     Aristov Y.I.
# 17:         Ali S.M.   Chakraborty A.

这能满足你的问题吗? 关键是combn - 创建可能组合的函数