我在R中有以下形式的字符串:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
我希望获得两列:
namei1 namej1 | surname1
name2 | surnamei2 surnamej2
name3 | surname3
我尝试使用字符串拆分:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\\,+[[:space:]]"
str_split(example, pattern)
但是,我被困在这里...
答案 0 :(得分:5)
read.csv(text = gsub("([^,]+,[^,]+),", "\\1\n", example),
header = FALSE, stringsAsFactors = FALSE)
# V1 V2
# 1 namei1 namej1 surname1
# 2 name2 surnamei2 surnamej2
# 3 name3 surname3
答案 1 :(得分:4)
我们可以在,
处分割字符串,后跟零个或多个空格(\\s*
),然后根据'name'字符串和split
{ {1}}(vector
)到v1
个list
列表vector
data.frame中的rbind the
elements and convert it to a
或者另一个选择是v1 <- strsplit(example, ",\\s*")[[1]]
setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\\bname',
v1)))), paste0("V", 1:2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
并将其转换为两列scan
matrix
另一种选择是as.data.frame( matrix(trimws(scan(text = example, sep=",",
what = "", quiet = TRUE)), byrow = TRUE, ncol = 2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
,其中我们用gsub
和'name'替换,
,后跟空格和'name'字符串,并在其中使用。\n
根据定界符read.csv
,
答案 2 :(得分:3)
data.frame(split(unlist(strsplit(example, ", ")), c(0, 1)))
# X0 X1
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3