为连续值创建组名

时间:2016-06-14 10:12:03

标签: r run-length-encoding

看起来很简单,无法找到更简单的方法。我下面有一个x向量,需要为连续值创建组名。我的尝试是使用rle,更好的想法?

# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)

# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"

4 个答案:

答案 0 :(得分:11)

使用rleid中的data.table

library(data.table)

paste0('Group_', rleid(x))
 #[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

答案 1 :(得分:9)

使用diffcumsum

paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

(如果您的值是浮点值,则可能必须避免!=并改为使用容差。)

答案 2 :(得分:3)

使用cumsum但不依赖于数字数据:

Sub Test()
    Dim array2(25, 25) As Double
    Dim i As Integer, j As Integer
    For i = 0 To UBound(array2, 1)
        For j = 0 To UBound(array2, 1)
            array2(i, j) = Int((Rnd * 100) + 1)
        Next
    Next


    MsgBox WorksheetFunction.Sum(array2)
End Sub

答案 3 :(得分:2)

groupdata2中的

group()可以使用l_starts方法从组起点列表创建组。通过将n设置为auto,它会自动找到分组开始:

x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")

## # A tibble: 11 x 2
## # Groups:   .groups [5]
##     data .groups
##    <dbl> <fct>  
##  1     1 1      
##  2     1 1      
##  3     1 1      
##  4     2 2      
##  5     2 2      
##  6     2 2      
##  7     3 3      
##  8     2 4      
##  9     2 4      
## 10     1 5      
## 11     1 5     

还有一个differs_from_previous()函数,用于查找与前一个值相差某个阈值的值或值的索引。

# The values to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both")
## [1] 2 3 2 1

# The indices to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both",
                      return_index = TRUE)
## [1] 4 7 8 10