从[0.2,0.2,0.2]到" R 0.2 3"

时间:2017-02-20 19:34:56

标签: r string parsing vector

我在R中有一个长向量,其中连续值经常重复。例如

#!/bin/bash
( 
  # Any command you like...
  echo "PNG: 3829"
  # Other commands
  echo "GIF: 233"
) | sort -k2,2n

我正在尝试编写一个函数,该函数将此向量作为输入并返回以下两个字符串之一

x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))

或优先

s = "R 0.2 1500 R 0.1 10007 R 0.7 1 R 0.9 1 R 0.1 9784"

,其中s = "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784" 成为R 0.7 1 R 0.9 1

为了您的直觉,0.7 0.9代表Rrepeat。因此,字符串与我构造向量rep的方式紧密重合。

我尝试循环遍历每个值,但这对我的需求来说太慢了。你能帮我找到一个快速的解决方案吗?

1 个答案:

答案 0 :(得分:1)

#Data
x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))

#Run rle and paste values and lengths together
y = paste("R", rle(x)$values, rle(x)$lengths)

#There may be an easier way to do this using regex
#But here is one solution using strsplit
#Remove 1 and R
y = sapply(strsplit(y," "), function(a)
    if (gsub(" ","",a[3]) == "1"){
        a = a[2]
    } else {
    a = a
    }
)

#Collapse everything together
paste(unlist(y), collapse = " ")
#[1] "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784"