我在R中有一个长向量,其中连续值经常重复。例如
#!/bin/bash
(
# Any command you like...
echo "PNG: 3829"
# Other commands
echo "GIF: 233"
) | sort -k2,2n
我正在尝试编写一个函数,该函数将此向量作为输入并返回以下两个字符串之一
x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))
或优先
s = "R 0.2 1500 R 0.1 10007 R 0.7 1 R 0.9 1 R 0.1 9784"
,其中s = "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784"
成为R 0.7 1 R 0.9 1
。
为了您的直觉,0.7 0.9
代表R
或repeat
。因此,字符串与我构造向量rep
的方式紧密重合。
我尝试循环遍历每个值,但这对我的需求来说太慢了。你能帮我找到一个快速的解决方案吗?
答案 0 :(得分:1)
#Data
x = c(rep(0.2, 1500), rep(0.1, 10007), 0.7, 0.9, rep(0.1, 9784))
#Run rle and paste values and lengths together
y = paste("R", rle(x)$values, rle(x)$lengths)
#There may be an easier way to do this using regex
#But here is one solution using strsplit
#Remove 1 and R
y = sapply(strsplit(y," "), function(a)
if (gsub(" ","",a[3]) == "1"){
a = a[2]
} else {
a = a
}
)
#Collapse everything together
paste(unlist(y), collapse = " ")
#[1] "R 0.2 1500 R 0.1 10007 0.7 0.9 R 0.1 9784"