考虑这两个字符串:
string1 <- "GCTCCC...CTCCATGAAGTA...CTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
string_reference <- "GCTCCC...CTCCATGAAGTATTTCTTCACATCCGTGT.CCGGCCTGGCCGCGGAGAGCCC"
如何轻松删除“string1”中的点,但只删除“string_reference”中位于相同位置的点?
预期产出:
string1 = "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
答案 0 :(得分:7)
我只是使用R的真正矢量化子集和逻辑比较方法......
# Split the strings
x <- strsplit( c( string1 , string_reference ) , "" )
# Compare and remove dots from string1 when dots also appear in the reference string at the same position
paste( x[[1]][ ! (x[[2]]== "." & x[[1]] == ".") ] , collapse = "" )
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
答案 1 :(得分:6)
类似于Robert's,但是&#34;矢量化&#34;版本:
s1 <- unlist(strsplit(string1, ""))
s2 <- unlist(strsplit(string_reference, ""))
paste0(Filter(Negate(is.na), ifelse(s1 == s2 & s1 == ".", NA, s1)), collapse="")
# [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
我引用&#34;矢量化&#34;因为矢量化发生在字符串向量的字符上。这假设您的字符串向量中只有一个元素。如果您的字符串向量中有多个元素,则必须循环遍历strsplit
的结果。
答案 2 :(得分:5)
使用intersect
查找重叠的.
&#39>
cutpos <- do.call(intersect,
sapply(list(string_reference,string1), gregexpr, pattern=".", fixed=TRUE)
)
paste(strsplit(string1,"",fixed=TRUE)[[1]][-cutpos],collapse="")
#[1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
上述的一小部分(由@Arun提供):
attr(cutpos, 'match.length') <- rep(1L, length(cutpos))
attr(cutpos, 'useBytes') <- TRUE
do.call(paste0, c(regmatches(string1, list(cutpos), invert=TRUE), collapse=""))
## [1] "GCTCCCCTCCATGAAGTA...CTTCACATCCGTGTCCGGCCTGGCCGCGGAGAGCCC"
答案 3 :(得分:1)
使用:
string1v <- strsplit(string1, "")[[1]]
string_referencev <- strsplit(string_reference, "")[[1]]
stopifnot(length(string1v) == length(string_referencev))
finalstring <- paste(vapply(seq_along(string1v), function(ind) {
if (string1v[ind] == '.' && string_referencev[ind] == '.') ''
else string1v[ind]
}, character(1)), collapse = "")