来自 Web of Science 我已在textfile下载了500篇文章引文。只有作者的专栏(AU)被读入R.该变量包含由分号分隔的Author1到AuthorN:
Anselin,L;藤田,M;这个,JF
我想在不同的专栏中提取Author1,Author2,Author3 ... AuthorN。在我的文件中,我有多达10位作者。在此示例中,最多7位作者:
#Sample of Data
data <- c("Anselin, L; Varga, A; Acs, Z",
"Acs, ZJ; Anselin, L; Varga, A",
"Anselin, L",
"Fujita, M; Thisse, JF",
"Turner, RK; van den Bergh, JCJM; Soderqvist, T; Barendregt, A; van der Straaten, J; Maltby, E; van Ierland, EC",
"Talen, E; Anselin, L",
"Irwin, EG; Bockstael, NE",
"Leggett, CG; Bockstael, NE",
"Guimaraes, P; Figueiredo, O; Woodward, D",
"Halpern, Benjamin S.; McLeod, Karen L.; Rosenberg, Andrew A.; Crowder, Larry B.")
我尝试了很多途径:
#Method3 - Read table : Not same amount of elements
Meth3 <- read.table(textConnection(data), sep=";", stringsAsFactors=FALSE)
#Method2 - Separate in different column : repeats the Names
Meth2 <- do.call(rbind,
strsplit(gsub(";",
"\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", data),
"NONSENSESPLIT"))
#Method5 - Split row entries, make an identifier and recombine them later : Struggle to recombine
Meth5 <- strsplit(data, ";")
i <- 0
id <- unlist( sapply( Meth5, function(r) rep(i<<-i+1, length(r) ) ) )
x <- unlist(Meth5, recursive = FALSE )
x <- list(do.call(rbind,
strsplit(gsub(";",
"\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", x),
"NONSENSESPLIT")))
require(data.table)
data.table( ID=id, do.call(rbind,x))
#Method6: Identifies first Author :
Meth6 <- gsub("[^a-zA-Z0-9 ]","",strsplit(data,"\\; ")[[1]][[1]])
欢迎任何关于组织和识别作者1 ...作者N的建议。
答案 0 :(得分:4)
read.csv
支持此:
read.csv(text=data,header=FALSE,sep=";")
V1 V2 V3 V4 V5 V6 V7
1 Anselin, L Varga, A Acs, Z
2 Acs, ZJ Anselin, L Varga, A
3 Anselin, L
4 Fujita, M Thisse, JF
5 Turner, RK van den Bergh, JCJM Soderqvist, T Barendregt, A van der Straaten, J Maltby, E van Ierland, EC
6 Talen, E Anselin, L
7 Irwin, EG Bockstael, NE
8 Leggett, CG Bockstael, NE
9 Guimaraes, P Figueiredo, O Woodward, D
10 Halpern, Benjamin S. McLeod, Karen L. Rosenberg, Andrew A. Crowder, Larry B.