Question

我在R中有一个长向量，其中连续值经常重复。例如

#!/bin/bash
( 
  # Any command you like...
  echo "PNG: 3829"
  # Other commands
  echo "GIF: 233"
) | sort -k2,2n

我正在尝试编写一个函数，该函数将此向量作为输入并返回以下两个字符串之一

x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))

或优先

s = "R 0.2 1500 R 0.1 10007 R 0.7 1 R 0.9 1 R 0.1 9784"

，其中s = "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784"成为R 0.7 1 R 0.9 1。

为了您的直觉，0.7 0.9代表R或repeat。因此，字符串与我构造向量rep的方式紧密重合。

我尝试循环遍历每个值，但这对我的需求来说太慢了。你能帮我找到一个快速的解决方案吗？

Answer 1

#Data
x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))

#Run rle and paste values and lengths together
y = paste("R", rle(x)$values, rle(x)$lengths)

#There may be an easier way to do this using regex
#But here is one solution using strsplit
#Remove 1 and R
y = sapply(strsplit(y," "), function(a)
    if (gsub(" ","",a[3]) == "1"){
        a = a[2]
    } else {
    a = a
    }
)

#Collapse everything together
paste(unlist(y), collapse = " ")
#[1] "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784"

从[0.2,0.2,0.2]到＆＃34; R 0.2 3＆＃34;

1 个答案: