找到字符串向量的非相交部分

时间:2017-08-15 20:10:14

标签: r string intersect

我有string vector

例如:

vec <- c("aa.30.1","aa.40.1","aa.50.1")

但它也可能是:

vec2 <- c("a2bsx","a2bsy","a2bsz")

甚至:

vec3 <- c("mean.ln.scaled.ST_mus.control.30.1","mean.ln.scaled.ST_mus.control.60.1","mean.ln.scaled.ST_mus.control.150.1","mean.ln.scaled.ST_mus.control.300.1","mean.ln.scaled.STN_mus.control.1440.1")

我正在寻找能够返回向量元素的非重叠右尾的函数。

对于vec,结果将是:

c("30.1","40.1","50.1")

vec2

vec2 <- c("x","y","z")

对于vec3

vec3 <- c("30.1","60.1","150.1","300.1","1440.1")

任何可以捕获所有案例的函数?

4 个答案:

答案 0 :(得分:2)

数据

Woops!忘了提到你在clearInterval的最后一个元素中有一个拼写错误。

vec3

使用此递归函数,该函数比较向量中每个条目的第一个字符(vec3 <- c("mean.ln.scaled.ST_mus.control.30.1","mean.ln.scaled.ST_mus.control.60.1","mean.ln.scaled.ST_mus.control.150.1","mean.ln.scaled.ST_mus.control.300.1","mean.ln.scaled.ST_mus.control.1440.1") ),并检查它们是否相同(temp[[x]][1])。如果它们不相同,则将剩余的字符(在右侧)作为字符串(length(unique(sapply(1:length(temp), function(x) temp[[x]][1])))>1)返回,否则,再次调用该函数以检查下一个字符。

sapply(1:length(temp), function(x) paste0(temp[[x]], collapse=""))

注意错误

special <- function(v) {
            temp <- strsplit(v, "")
            y <- sapply(1:length(temp), function(x) length(temp[[x]]))
            mincol <- min(y)
            maxcol <- max(y)
            is.unique <- length(unique(sapply(1:length(temp), function(x) temp[[x]][1])))>1
            if (is.unique == TRUE) {
                  ans <- sapply(1:length(temp), function(x) paste0(temp[[x]], collapse=""))
                  return(ans)
            } else {
                  tryagain <- sapply(1:length(temp), function(x) paste0(temp[[x]][2:y[x]], collapse=""))
                  special(tryagain)
            }
       }

special(vec)
#"30.1" "40.1" "50.1"

special(vec2)
#"x" "y" "z"

special(vec3)
#"30.1"   "60.1"   "150.1"  "300.1"  "1440.1"

答案 1 :(得分:1)

我们可以尝试

gsub("\\D+", "", gsub(paste(Reduce(intersect, strsplit(vec, "[.]")), collapse="|"), "", vec))
#[1] "30" "40" "50"

答案 2 :(得分:1)

我不知道这是否会有所帮助:

 funfun=function(x){
    for(i in 1:(nchar(x[1])+1)){
      y=substr(x[1],1,i)
      w=which(grepl(y,x)==FALSE)
      if(length(w)>0)break
    }
  gsub(substr(x[1],1,i-1),"",x)
 }


vec1 <- c("aa.30.1","aa.40.1","aa.50.1")
vec2 <- c("a2bsx","a2bsy","a2bsz")
funfun(vec1)
[1] "30.1" "40.1" "50.1"
funfun(vec2)
[1] "x" "y" "z"

我意识到上面给出的vec3在最后一个元素中有一个与其余元素不同的STN:

  vec3
 [1] "mean.ln.scaled.ST_mus.control.30.1"      "mean.ln.scaled.ST_mus.control.60.1"     
 [3] "mean.ln.scaled.ST_mus.control.150.1"     "mean.ln.scaled.ST_mus.control.300.1"    
 [5] "mean.ln.scaled.STN_mouse.control.1440.1"

 funfun(vec3)
[1] "_mus.control.30.1"      "_mus.control.60.1"      "_mus.control.150.1"    
[4] "_mus.control.300.1"     "N_mouse.control.1440.1".

如果我们使用vec3的前四个元素:

 funfun(vec3[-5])
[1] "30.1"  "60.1"  "150.1" "300.1"

或者如果我们通过删除 ST 之后的 N 并将鼠标更改为mus来改变vec3的最后一个元素

  vec3[5]
 [1] "mean.ln.scaled.STN_mouse.control.1440.1"
  vec3[5]="mean.ln.scaled.ST_mus.control.1440.1"
 funfun(vec3)
 [1] "30.1"   "60.1"   "150.1"  "300.1"  "1440.1"

答案 3 :(得分:1)

你也可以这样做:

library(stringr)
difftail = function(x) gsub(tail(Reduce(intersect,lapply(x,function(x) str_sub(x,1,1:nchar(x)))),1),"",x)

difftail(vec)
[1] "30.1" "40.1" "50.1"

difftail(vec2)
[1] "x" "y" "z"

difftail(vec3)
[1] "_mus.control.30.1"    "_mus.control.60.1"    "_mus.control.150.1"   "_mus.control.300.1"   "N_mus.control.1440.1"