如何根据第二个字符串删除字符串中的字符?

时间:2014-03-24 22:45:03

标签: r

考虑这两个字符串:

string1 <- "GCTCCC...CTCCATGAAGTA...CTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
string_reference <- "GCTCCC...CTCCATGAAGTATTTCTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"

如何轻松删除“string1”中的点,但只删除“string_reference”中位于相同位置的点?

预期产出:

string1 = "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

4 个答案:

答案 0 :(得分:7)

我只是使用R的真正矢量化子集和逻辑比较方法......

# Split the strings
x <- strsplit( c( string1 , string_reference ) , "" )
# Compare and remove dots from string1 when dots also appear in the reference string at the same position
paste( x[[1]][ ! (x[[2]]== "." & x[[1]] == ".") ] , collapse = "" )
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

答案 1 :(得分:6)

类似于Robert's,但是&#34;矢量化&#34;版本:

s1 <- unlist(strsplit(string1, ""))
s2 <- unlist(strsplit(string_reference, ""))
paste0(Filter(Negate(is.na), ifelse(s1 == s2 & s1 == ".", NA, s1)), collapse="")
# [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

我引用&#34;矢量化&#34;因为矢量化发生在字符串向量的字符上。这假设您的字符串向量中只有一个元素。如果您的字符串向量中有多个元素,则必须循环遍历strsplit的结果。

答案 2 :(得分:5)

使用intersect查找重叠的.&#39>

cutpos <- do.call(intersect, 
        sapply(list(string_reference,string1), gregexpr, pattern=".", fixed=TRUE)
          )
paste(strsplit(string1,"",fixed=TRUE)[[1]][-cutpos],collapse="")
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

上述的一小部分(由@Arun提供):

attr(cutpos, 'match.length') <- rep(1L, length(cutpos))
attr(cutpos, 'useBytes') <- TRUE

do.call(paste0, c(regmatches(string1, list(cutpos), invert=TRUE), collapse=""))
## [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"

答案 3 :(得分:1)

使用:

string1v <- strsplit(string1, "")[[1]]
string_referencev <- strsplit(string_reference, "")[[1]]
stopifnot(length(string1v) == length(string_referencev))
finalstring <- paste(vapply(seq_along(string1v), function(ind) {
  if (string1v[ind] == '.' && string_referencev[ind] == '.') ''
  else string1v[ind] 
}, character(1)), collapse = "")