我正在尝试将相同的连续观察值合并到折叠的字符串中。一个简单的例子如下:
a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")
[1] "H" "H" "H" "N" "T" "N" "T" "H" "N" "T" "T"
b <- c("HHH", "N", "T", "N", "T", "H", "N", "TT")
[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
c <- c("HHH", "HHH", "N", "T", "N", "T", "H", "N", "TT", "TT")
[1] "HHH" "HHH" "N" "T" "N" "T" "H" "N" "TT" "TT"
在这里,我想创建一个函数,它将向量a
转换为向量b
或c
。例如,由于前三个观察结果都是H
,因此它们将一起成为HHH
。与两个T
变为TT
相同。请注意,我想保留整体顺序,并且给定元素以连续方式出现的次数不限于三次。因此,例如,连续可能有10个A
,应该转换为单个AAAAAAAAAA
。
我已尝试从for
循环开始逐步构建,但由于连续发生的重复次数不受限制而无法进一步发展。我还尝试使用基础rle
函数。但
rle(a)
给出类似
的内容Run Length Encoding
lengths: int [1:8] 3 1 1 1 1 1 1 2
values : chr [1:8] "H" "N" "T" "N" "T" "H" "N" "T"
其中十个元素变为8,并且不记录连续出现的位置。
答案 0 :(得分:1)
with(rle(a), sapply(1:length(values), function(i)
paste(rep(values[i], lengths[i]), collapse = "")))
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
OR
sapply(split(a, cumsum(c(TRUE, a[-1] != head(a, -1)))), paste, collapse = "")
# 1 2 3 4 5 6 7 8
#"HHH" "N" "T" "N" "T" "H" "N" "TT"
答案 1 :(得分:1)
您可以将gregexpr
与regmatches
:
a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")
# collapse string
b <- paste(a, collapse = "")
# extract instances of repeated characters
regmatches(b, gregexpr("(.)\\1*", b))[[1]]
# [1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
stringi
等价物可能是:
library(stringi)
stri_extract_all_regex(b, "(.)\\1*")[[1]]
# [1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
ore
一揽子方案:
library(ore)
matches(ore.search("(.)\\1*", b, all = TRUE))
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
答案 2 :(得分:0)
我们可以使用rleid
data.table
library(data.table)
unname(tapply(a, rleid(a), FUN = paste, collapse=""))
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
或base R
rle
和tapply
with(rle(a), unname(tapply(a, rep(seq_along(values), lengths), FUN = paste, collapse="")))
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
或base R
选项将paste
与字符串组合在一起,并使用正则表达式外观在重复字符之间拆分
strsplit(paste(a, collapse=""), "(?<=(.))(?!\\1)", perl = TRUE)[[1]]
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"
答案 3 :(得分:-1)
除了已经给出的解决方案之外,我对一种不依赖于任何语言特异性的通用算法感兴趣。
你说你试过了,但是我没有看到重复数量的限制是一个真正的问题。我写的基本上是迭代原始数组并克隆它。如果原始数组的值与最后一个数组的值相同,而不是将其作为新项添加到新数组中,请将其连接到&#34; clone&#34;的最后一个值中。阵列。
算法:
Create empty array(w)
Iterate by index(i) of the original vector(v)
If this is the first entry
w[1] = v[1]
Else
If v[i] is the same as v[i-1]
Last entry in w is concatenated with v[i]
Else
Add v[i] to the end of w
在Python中:
def collapseVector(v):
w = [];
for i in range(len(v)):
if i == 0:
w.append(v[i]);
else:
if v[i] == v[i-1]:
w[len(w)-1] = w[len(w)-1] + v[i];
else:
w.append(v[i]);
return w