Question

我有一些因子变量（苹果或香蕉）的数据，我希望能够识别我的数据集中的位置，其中值是两个连续行中的这两个选项之一（即下面的第4行和第5行）苹果和下面的8和9行香蕉）。我知道重复的函数在这里很有用（即Index out the subsequent row with an identical value in R），但我不知道如何用分类变量实现我想要的输出。

示例数据：

  test =  structure(list(cnt = c(87L, 51L, 24L, 69L, 210L, 21L, 15L, 9L, 
    12L), type = c("apple", "banana", "apple", "banana", "banana", 
    "apple", "banana", "apple", "apple")), .Names = c("cnt", "type"
    ), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
    -9L), spec = structure(list(cols = structure(list(cnt = structure(list(), class = c("collector_integer", 
    "collector")), type = structure(list(), class = c("collector_character", 
    "collector"))), .Names = c("cnt", "type")), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))

期望的输出：

    cnt   type  output
1    87  apple FALSE
2    51 banana FALSE
3    24  apple FALSE
4    69 banana TRUE
5   210 banana TRUE
6    21  apple FALSE
7    15 banana FALSE
8     9  apple TRUE
9    12  apple TRUE

当我使用下面的代码时，我得到一个摘要，告诉我苹果和香蕉都是重复的！：

test[!duplicated(test[,"type], fromLast=TRUE,]

非常感谢任何帮助。

Answer 1

我们可以通过多种方式实现这一目标。一个选项是来自rleid的{{1}}来创建基于相同的adjacenet元素的分组变量，然后创建＆＃39;输出＆＃39;列通过分配（data.table）逻辑条件的输出即。如果元素的数量大于1（:=）

.N >1

根据OP的说明，library(data.table) setDT(test)[, output := .N>1, rleid(type)] test # cnt type output #1: 87 apple FALSE #2: 51 banana FALSE #3: 24 apple FALSE #4: 69 banana TRUE #5: 210 banana TRUE #6: 21 apple FALSE #7: 15 banana FALSE #8: 9 apple TRUE #9: 12 apple TRUE的一个选项是

tidyverse

Answer 2

我们可以尝试运行长度编码：

x <- rle(test$type)
x$values <- ifelse(x$lengths == 2, TRUE, FALSE)

test$output <- inverse.rle(x)
# > test
#   cnt   type output
# 1  87  apple  FALSE
# 2  51 banana  FALSE
# 3  24  apple  FALSE
# 4  69 banana   TRUE
# 5 210 banana   TRUE
# 6  21  apple  FALSE
# 7  15 banana  FALSE
# 8   9  apple   TRUE
# 9  12  apple   TRUE

确定数据帧中的行，其中下一行在R中具有相同的字符值

2 个答案: