如何找到满足设定标准的数据帧尾部?

时间:2016-02-15 07:08:23

标签: r dplyr

我的数据示例结构如下:

Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Jane", "Jane", "Jane", "Jane", 
                                      "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill"),  
                      Time = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6),
                      Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", 
                                    "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"),
                      Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", 
                                   "Away", "Away", "Away", "Away", "Away", "Away"),
                      Power = c(400, 250, 180, 500, 300, 450, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570, 456, 205))

我希望Participant等于安慰剂,Condition等于首页时,找到每个Location的尾行。这将用于检查最后一个时间点的Power,因此我可以检查之前的剩余10行。因此,找到行号非常重要。

我知道我可以使用以下内容查找每个Participant的最后一行:

ddply(Individ,.(Participant, Time, Condition),function(x) tail(x,1))

但是,我的实际数据框长度为400万行,参与者超过50人,Power收集Time点不同。有没有办法可以快速做到这一点,这在计算上并不昂贵?

干杯!

2 个答案:

答案 0 :(得分:1)

你已经列出了逻辑:

Individ %>% mutate(Row = seq_along(Time)) %>% 
    filter(Condition == 'Placebo', Location == 'Home') %>% 
    group_by(Participant) %>% slice(n())

使用Rowmutate添加seq_along()个号码; filter下到适合您条件的行; group_by参与者,slice为该参与者提取与n行数相对应的行。

对于您的样本数据,它返回:

Source: local data frame [2 x 6]
Groups: Participant [2]

  Participant  Time Condition Location Power   Row
       (fctr) (dbl)    (fctr)   (fctr) (dbl) (int)
1        Bill     6   Placebo     Home   450     6
2        Jane     6   Placebo     Home   451    18

它没有做大量繁重的工作而且经过了相当的优化,所以它应该相当快,但是400万行很多,所以你可能需要查看data.table或将其存储为SQL,你可以使用它dplyr作为界面。

答案 1 :(得分:1)

使用data.table,我们可以将'data.frame'转换为'data.table'(setDT(Individ)),按'参与者'分组,使用逻辑条件('条件=='安慰剂'&amp; Location =='Home') in 'i' and subset the last observation ( tail(.SD,1L)or。SD [.N]`)

library(data.table)
setDT(Individ)[Condition=='Placebo' & Location=='Home', 
                             tail(.SD, 1L) ,.(Participant)]
#   Participant Time Condition Location Power
#1:        Bill    6   Placebo     Home   450
#2:        Jane    6   Placebo     Home   451

如果我们需要行号,我们可以使用.I

来获取
setDT(Individ)[Condition=='Placebo' & Location=='Home',
        c(rn = .I[.N],tail(.SD, 1L)) ,.(Participant)]
#    Participant rn Time Condition Location Power
#1:        Bill  6    6   Placebo     Home   450
#2:        Jane 18    6   Placebo     Home   451