使用重复的定界符分割字符串

时间:2019-09-04 18:17:18

标签: r regex string strsplit

我在R中有以下形式的字符串:

example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")

我希望获得两列:

namei1 namej1   | surname1
name2           | surnamei2 surnamej2
name3           | surname3

我尝试使用字符串拆分:

example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\\,+[[:space:]]"
str_split(example, pattern)

但是,我被困在这里...

3 个答案:

答案 0 :(得分:5)

read.csv(text = gsub("([^,]+,[^,]+),", "\\1\n", example), 
         header = FALSE, stringsAsFactors = FALSE)
#              V1                   V2
# 1 namei1 namej1             surname1
# 2         name2  surnamei2 surnamej2
# 3         name3             surname3

答案 1 :(得分:4)

我们可以在,处分割字符串,后跟零个或多个空格(\\s*),然后根据'name'字符串和split { {1}}(vector)到v1list列表vector data.frame中的rbind the

elements and convert it to a

或者另一个选择是v1 <- strsplit(example, ",\\s*")[[1]] setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\\bname', v1)))), paste0("V", 1:2)) # V1 V2 #1 namei1 namej1 surname1 #2 name2 surnamei2 surnamej2 #3 name3 surname3 并将其转换为两列scan

matrix

另一种选择是as.data.frame( matrix(trimws(scan(text = example, sep=",", what = "", quiet = TRUE)), byrow = TRUE, ncol = 2)) # V1 V2 #1 namei1 namej1 surname1 #2 name2 surnamei2 surnamej2 #3 name3 surname3 ,其中我们用gsub和'name'替换,,后跟空格和'name'字符串,并在其中使用。\n根据定界符read.csv

进行分割
,

答案 2 :(得分:3)

data.frame(split(unlist(strsplit(example, ", ")), c(0, 1)))
#             X0                  X1
#1 namei1 namej1            surname1
#2         name2 surnamei2 surnamej2
#3         name3            surname3