R中的字符串拆分

时间:2016-09-21 06:27:55

标签: r regex split

下面的脚本是分割项目代码;

实施例

MR32456进入MR324, MR325, MR326

MR3091011加入MR309, MR301, MR300, MR301, MR301

我应该如何修改脚本,以便MR3091011,它会分成MR309, MR310, MR311

  rule2 <- c("MR")
    df_1 <- test[grep(paste("^",rule2,sep="",collapse = "|"),test$Name.y),]

SpaceName_1 <- function(s){
  num <- str_extract(s,"[0-9]+")
  if(nchar(num) >3){
    former <- substring(s, 1, 4)
    latter <- strsplit(substring(s,5,nchar(s)),"")
    latter <- unlist(latter)
    return(paste(former,latter,sep = "",collapse = ","))
  }
  else{
    return (s)
  }
}

df_1$Name.y <- sapply(df_1$Name.y, SpaceName_1)

2 个答案:

答案 0 :(得分:2)

借用this post的分割函数并对其进行矢量化,我们可以执行以下操作,

fun1 <- function(x){ 
  sapply(seq(from=1, to=nchar(substr(x, 4, nchar(x))), by=2), function(i) substr(substr(x, 4, nchar(x)), i, i+1))
} 

fun1 <- Vectorize(fun1)

Map(paste0, substr(x, 1, 3), fun1(x))

#$MR3
#[1] "MR309" "MR310" "MR311"

#$MR3
#[1] "MR324" "MR356"

答案 1 :(得分:0)

试试这个:

str <- 'MR3091011'
paste(substring(str,1,4), strsplit(str,"")[[1]][-(1:4)], sep='')

[1] "MR309" "MR301" "MR300" "MR301" "MR301"

您可以尝试使用字符串列表:

strlst <- c("MR32456", "MR3091011")
lapply(strlst, function(str) paste(substring(str,1,4), 
                                   strsplit(str,"")[[1]][-(1:4)], sep=''))    
[[1]]
[1] "MR324" "MR325" "MR326"

[[2]]
[1] "MR309" "MR301" "MR300" "MR301" "MR301"

<强> [编辑]

groups <- unlist(strsplit(sub('([[:alpha:]]+)(\\d)(\\d{2})(\\d{2})(\\d{2})', '\\1 \\2 \\3 \\4 \\5', 'MR3091011'), split=' '))
paste0(groups[1], groups[2], groups[3:5])
# [1] "MR309" "MR310" "MR311"