如何根据条件删除数据帧的N行

时间:2016-02-22 00:56:01

标签: r

我的问题从How to find tail rows of a data frame that satisfy set criteria?开始,因此,我的(更新的)样本数据的结构如下:

Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill", "Bill", "Bill", "Bill"),  
                      Time = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4),
                      Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr"),
                      Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away"),
                      Power = c(400, 250, 180, 500, 300, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570))

我已经学会根据最后一次出现的Participant在不同的ConditionLocation找到每个Power的尾行。我现在希望删除每个ParticipantCondition的每个Location的最后3行。但是,针对每个TimeParticipant收集的Condition会有所不同,因此我无法完全基于标准Time删除行。

如何快速遍历每个Participant及其各自的ConditionLocation并删除最后3行?我的实际数据框是400万行+超过50个参与者,理想情况下,迭代每个ParticipantCondition的解决方案是可取的。

我的预期输出是:

Output <- data.frame(Participant = c("Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill"),
                     Time = c(1, 2, 1, 2, 3, 1, 2, 3, 1),
                     Condition = c("Placebo", "Placebo", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Expr"),
                     Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away"),
                     Power = c(400, 250, 600, 512, 300, 402, 210, 130, 608))

2 个答案:

答案 0 :(得分:2)

如果您使用dplyrrow_number()n() ...

library(dplyr)
Individ %>%
  group_by(Participant, Condition, Location) %>%
  filter(row_number() < n() - 2)

返回

Source: local data frame [9 x 5]
Groups: Participant, Condition, Location [4]

  Participant  Time Condition Location Power
       (fctr) (dbl)    (fctr)   (fctr) (dbl)
1        Bill     1   Placebo     Home   400
2        Bill     2   Placebo     Home   250
3        Jane     1      Expr     Home   600
4        Jane     2      Expr     Home   512
5        Jane     3      Expr     Home   300
6        Jane     1   Placebo     Home   402
7        Jane     2   Placebo     Home   210
8        Jane     3   Placebo     Home   130
9        Bill     1      Expr     Away   608

答案 1 :(得分:2)

使用data.table的选项。我们将'data.frame'转换为'data.table'(setDT(Individ)),按'参与者','条件'和'位置'进行转换,我们使用{{1}删除每个组合的最后3个观察值}}

head

library(data.table) setDT(Individ)[, head(.SD, -3) , .(Participant, Condition, Location)] # Participant Condition Location Time Power #1: Bill Placebo Home 1 400 #2: Bill Placebo Home 2 250 #3: Jane Expr Home 1 600 #4: Jane Expr Home 2 512 #5: Jane Expr Home 3 300 #6: Jane Placebo Home 1 402 #7: Jane Placebo Home 2 210 #8: Jane Placebo Home 3 130 #9: Bill Expr Away 1 608 中的等效选项是

dplyr