Question

如何在像这样的df中唯一（rle unique）元组

structure(c("M01", "M01", "M01", "M01", "M01", "M02", "M02", 
"M02", "M02", "M03", "M03", "F04", "F04", "F02", "F02", "F04", 
"F10", "F10", NA, "F10", "F01", "F01"), .Dim = c(11L, 2L), .Dimnames = list(
    NULL, c("a", "b")))

> sample
      a     b    
 [1,] "M01" "F04"
 [2,] "M01" "F04"
 [3,] "M01" "F02"
 [4,] "M01" "F02"
 [5,] "M01" "F04"
 [6,] "M02" "F10"
 [7,] "M02" "F10"
 [8,] "M02" NA   
 [9,] "M02" "F10"
[10,] "M03" "F01"
[11,] "M03" "F01"

得到这个：

structure(c("M01", "M01", "M01", "M02", "M02", "M03", "F04", 
"F02", "F04", "F10", "F10", "F01"), .Dim = c(6L, 2L), .Dimnames = list(
    NULL, c("d", "c")))
> output
     d     c    
[1,] "M01" "F04"
[2,] "M01" "F02"
[3,] "M01" "F04"
[4,] "M02" "F10"
[5,] "M02" "F10"
[6,] "M03" "F01"

所以我的想法是用元组获得一个df，但是基于行并且仅基于前一个元素是唯一的，所以：唯一的（样品）不给我需要的东西。可以在这个df上运行以考虑元组，并保持df作为输出吗？有更好的方法吗？

rle(sample[,2]$values)

给出了想要的结果，但显然我放弃了第1列的有价值的信息。

Answer 1

这个怎么样？

# dd is the matrix structure you posted in the question
dd <- as.data.frame(dd)                     ## convert to data.frame
dd[] <- lapply(dd, as.character)            ## change columns to character
na.omit(dd[cumsum(rle(dd$b)$lengths), ])    ## get indices by cumsum'ing rle-lengths 
                                            ## wrap with na.omit to remove NA rows
#      a   b
# 2  M01 F04
# 4  M01 F02
# 5  M01 F04
# 7  M02 F10
# 9  M02 F10
# 11 M03 F01

R中的独特元组与rle

1 个答案: