
时间:2016-03-08 19:05:37

标签: r

我想看看我是否可以想象谁在同行评审的期刊上与谁一起出版某个科目。要做到这一点,我输入了关键字" Barrett""进入pubmed并下载了一个大文件,它给了我两列,TitleAuthor

structure(list(Title = structure(c(1L, 4L, 3L, 2L, 5L), .Label = c("A case of Barrett's adenocarcinoma with marked endoscopic morphological changes in Barrett's esophagus over a long follow-up period of 15\xe4\xf3\x8ayears.", 
"APE1-mediated DNA damage repair provides survival advantage for esophageal adenocarcinoma cells in response to acidic bile salts.", 
"Healthcare Cost of Over-Diagnosis of Low-Grade Dysplasia in Barrett's Esophagus.", 
"Radiofrequency ablation coupled with Roux-en-Y gastric bypass: a treatment option for morbidly obese patients with Barrett's esophagus.", 
"Risk factors for Barrett's esophagus."), class = "factor"), 
    Author = structure(c(3L, 5L, 4L, 2L, 1L), .Label = c("Arora Z, Garber A, Thota PN.", 
    "Hong J, Chen Z, Peng D, Zaika A, Revetta F, Washington MK, Belkhiri A, El-Rifai W.", 
    "Iwaya Y, Yamazaki T, Watanabe T, Seki A, Ochi Y, Hara E, Arakura N, Tanaka E, Hasebe O.", 
    "Lash RH, Deas TM Jr, Wians FH Jr.", "Parikh K, Khaitan L."
    ), class = "factor")), .Names = c("Title", "Author"), row.names = c(NA, 
5L), class = "data.frame")



1.Extract all the names into a long list from the Author column
2.Then create colnames from the Author list
3.Then create rownames from the Author list
4.Then somehow iterate through Auth[2] and count the name co-occurrence


AuthSplit<-strsplit(Auth$Author, ",", fixed=T)


 Error in data.frame(c("Iwaya Y", " Yamazaki T", " Watanabe T", " Seki A",  : 
  arguments imply differing number of rows: 9, 2, 3, 8, 20, 5, 1, 11, 4, 23, 6, 15, 16, 7, 12, 10, 14, 21, 13, 18, 19, 17, 22


1 个答案:

答案 0 :(得分:2)



#  add papers with authors from previous papers
  Auth <- rbind(Auth, 
              data.frame(Title=c("Paper A","Paper B"), 
                         Author=c("Iwaya Y, Parikh K, Lash RH", "Wians FH Jr., Lash RH")))

# create list of individual authors for each paper
  pub_auths <- sapply(Auth$Author, function(x) strsplit(as.character(x), split=","))
  pub_auths <- lapply(pub_auths, trimws)
# for each paper, form a data frame of unique author pairs 
  auth_pairs <- lapply(pub_auths, function(x) { z  <-  expand.grid(x, x, stringsAsFactors=FALSE);
                                        z[z$Var1 < z$Var2,]   })
# combine list of matrices for each paper into one data frame
  auth_pairs <- do.call(rbind, auth_pairs)
# count papers for each author pair
  auth_count <- aggregate( paste(Var1, Var2)  ~ Var1 + Var2 , data=auth_pairs, length)
  colnames(auth_count) <- c("Author1","Author2","Paper_count")
# create graph from author pairs
  g <- graph_from_data_frame(auth_count, directed=FALSE)
# plot graph
   plot(g, edge.label=E(g)$Paper_count, edge.label.cex=1.4, vertex.label.cex=1.4)


enter image description here