如何在像这样的df中唯一(rle unique)元组
structure(c("M01", "M01", "M01", "M01", "M01", "M02", "M02",
"M02", "M02", "M03", "M03", "F04", "F04", "F02", "F02", "F04",
"F10", "F10", NA, "F10", "F01", "F01"), .Dim = c(11L, 2L), .Dimnames = list(
NULL, c("a", "b")))
> sample
a b
[1,] "M01" "F04"
[2,] "M01" "F04"
[3,] "M01" "F02"
[4,] "M01" "F02"
[5,] "M01" "F04"
[6,] "M02" "F10"
[7,] "M02" "F10"
[8,] "M02" NA
[9,] "M02" "F10"
[10,] "M03" "F01"
[11,] "M03" "F01"
得到这个:
structure(c("M01", "M01", "M01", "M02", "M02", "M03", "F04",
"F02", "F04", "F10", "F10", "F01"), .Dim = c(6L, 2L), .Dimnames = list(
NULL, c("d", "c")))
> output
d c
[1,] "M01" "F04"
[2,] "M01" "F02"
[3,] "M01" "F04"
[4,] "M02" "F10"
[5,] "M02" "F10"
[6,] "M03" "F01"
所以我的想法是用元组获得一个df,但是基于行并且仅基于前一个元素是唯一的,所以: 唯一的(样品) 不给我需要的东西。可以在这个df上运行以考虑元组,并保持df作为输出吗?有更好的方法吗?
rle(sample[,2]$values)
给出了想要的结果,但显然我放弃了第1列的有价值的信息。
答案 0 :(得分:6)
这个怎么样?
# dd is the matrix structure you posted in the question
dd <- as.data.frame(dd) ## convert to data.frame
dd[] <- lapply(dd, as.character) ## change columns to character
na.omit(dd[cumsum(rle(dd$b)$lengths), ]) ## get indices by cumsum'ing rle-lengths
## wrap with na.omit to remove NA rows
# a b
# 2 M01 F04
# 4 M01 F02
# 5 M01 F04
# 7 M02 F10
# 9 M02 F10
# 11 M03 F01