我的问题从How to find tail rows of a data frame that satisfy set criteria?开始,因此,我的(更新的)样本数据的结构如下:
Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill", "Bill", "Bill", "Bill"),
Time = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4),
Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr"),
Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away"),
Power = c(400, 250, 180, 500, 300, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570))
我已经学会根据最后一次出现的Participant
在不同的Condition
加Location
找到每个Power
的尾行。我现在希望删除每个Participant
和Condition
的每个Location
的最后3行。但是,针对每个Time
和Participant
收集的Condition
会有所不同,因此我无法完全基于标准Time
删除行。
如何快速遍历每个Participant
及其各自的Condition
加Location
并删除最后3行?我的实际数据框是400万行+超过50个参与者,理想情况下,迭代每个Participant
和Condition
的解决方案是可取的。
我的预期输出是:
Output <- data.frame(Participant = c("Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill"),
Time = c(1, 2, 1, 2, 3, 1, 2, 3, 1),
Condition = c("Placebo", "Placebo", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Expr"),
Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away"),
Power = c(400, 250, 600, 512, 300, 402, 210, 130, 608))
答案 0 :(得分:2)
如果您使用dplyr
,row_number()
和n()
...
library(dplyr)
Individ %>%
group_by(Participant, Condition, Location) %>%
filter(row_number() < n() - 2)
返回
Source: local data frame [9 x 5]
Groups: Participant, Condition, Location [4]
Participant Time Condition Location Power
(fctr) (dbl) (fctr) (fctr) (dbl)
1 Bill 1 Placebo Home 400
2 Bill 2 Placebo Home 250
3 Jane 1 Expr Home 600
4 Jane 2 Expr Home 512
5 Jane 3 Expr Home 300
6 Jane 1 Placebo Home 402
7 Jane 2 Placebo Home 210
8 Jane 3 Placebo Home 130
9 Bill 1 Expr Away 608
答案 1 :(得分:2)
使用data.table
的选项。我们将'data.frame'转换为'data.table'(setDT(Individ)
),按'参与者','条件'和'位置'进行转换,我们使用{{1}删除每个组合的最后3个观察值}}
head
library(data.table)
setDT(Individ)[, head(.SD, -3) , .(Participant, Condition, Location)]
# Participant Condition Location Time Power
#1: Bill Placebo Home 1 400
#2: Bill Placebo Home 2 250
#3: Jane Expr Home 1 600
#4: Jane Expr Home 2 512
#5: Jane Expr Home 3 300
#6: Jane Placebo Home 1 402
#7: Jane Placebo Home 2 210
#8: Jane Placebo Home 3 130
#9: Bill Expr Away 1 608
中的等效选项是
dplyr