合并/折叠向量中

时间:2017-09-26 03:40:35

标签: r

我正在尝试将相同的连续观察值合并到折叠的字符串中。一个简单的例子如下:

a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")
[1] "H" "H" "H" "N" "T" "N" "T" "H" "N" "T" "T"

b <- c("HHH", "N", "T", "N", "T", "H", "N", "TT")
[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

c <- c("HHH", "HHH", "N", "T", "N", "T", "H", "N", "TT", "TT")
[1] "HHH" "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"  "TT" 

在这里,我想创建一个函数,它将向量a转换为向量bc。例如,由于前三个观察结果都是H,因此它们将一起成为HHH。与两个T变为TT相同。请注意,我想保留整体顺序,并且给定元素以连续方式出现的次数不限于三次。因此,例如,连续可能有10个A,应该转换为单个AAAAAAAAAA

我已尝试从for循环开始逐步构建,但由于连续发生的重复次数不受限制而无法进一步发展。我还尝试使用基础rle函数。但

rle(a)

给出类似

的内容
Run Length Encoding
  lengths: int [1:8] 3 1 1 1 1 1 1 2
  values : chr [1:8] "H" "N" "T" "N" "T" "H" "N" "T"

其中十个元素变为8,并且不记录连续出现的位置。

4 个答案:

答案 0 :(得分:1)

with(rle(a), sapply(1:length(values), function(i)
    paste(rep(values[i], lengths[i]), collapse = "")))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

OR

sapply(split(a, cumsum(c(TRUE, a[-1] != head(a, -1)))), paste, collapse = "")
#    1     2     3     4     5     6     7     8 
#"HHH"   "N"   "T"   "N"   "T"   "H"   "N"  "TT" 

答案 1 :(得分:1)

您可以将gregexprregmatches

一起使用
a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T")

# collapse string
b <- paste(a, collapse = "")

# extract instances of repeated characters
regmatches(b, gregexpr("(.)\\1*", b))[[1]]
# [1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

stringi等价物可能是:

library(stringi)
stri_extract_all_regex(b, "(.)\\1*")[[1]]
# [1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

ore一揽子方案:

library(ore)
matches(ore.search("(.)\\1*", b, all = TRUE))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT"

答案 2 :(得分:0)

我们可以使用rleid

中的data.table
library(data.table)
unname(tapply(a, rleid(a), FUN = paste, collapse=""))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

base R rletapply

with(rle(a), unname(tapply(a, rep(seq_along(values), lengths), FUN = paste, collapse="")))
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

base R选项将paste与字符串组合在一起,并使用正则表达式外观在重复字符之间拆分

strsplit(paste(a, collapse=""), "(?<=(.))(?!\\1)", perl = TRUE)[[1]]
#[1] "HHH" "N"   "T"   "N"   "T"   "H"   "N"   "TT" 

答案 3 :(得分:-1)

除了已经给出的解决方案之外,我对一种不依赖于任何语言特异性的通用算法感兴趣。

你说你试过了,但是我没有看到重复数量的限制是一个真正的问题。我写的基本上是迭代原始数组并克隆它。如果原始数组的值与最后一个数组的值相同,而不是将其作为新项添加到新数组中,请将其连接到&#34; clone&#34;的最后一个值中。阵列。

算法:

Create empty array(w)
Iterate by index(i) of the original vector(v)
   If this is the first entry
      w[1] = v[1]
   Else
      If v[i] is the same as v[i-1]
         Last entry in w is concatenated with v[i]
      Else
         Add v[i] to the end of w

在Python中:

def collapseVector(v):
    w = [];
    for i in range(len(v)):
        if i == 0:
            w.append(v[i]);
        else:
            if v[i] == v[i-1]:
                w[len(w)-1] = w[len(w)-1] + v[i];
            else:
                w.append(v[i]);
    return w