根据另一列R中的出现次数选择行

时间:2015-08-31 15:24:27

标签: r

我不确定这个标题是否真的反映了我想要做的事情。最终,我想在ActionType列中按组选择具有特定模式的行。分组变量为email。对于每个email,如果ActionType的第一行是胜利,那么我想删除它并查看第二行。如果ActionType的第二行是胜利,那么我想删除它并移动到下一行,依此类推。

基本上条件1是每封电子邮件的第一行必须是胜利。

接下来,一旦满足,我想从第一行(这不是一场胜利)到下一场胜利中选择一切。

然后,该过程将重复进行,直到检查了所有按组划分的行。我不关心在胜利之后发生的行,除非他们在另一次胜利之前。此外,如果两次胜利是背靠背,那么我想选择直到第一场胜利的行(包括那场胜利)。删除之后发生的那个,然后继续检查行并保留在另一个胜利之前的那些行。

我尝试将cumsumdplyrdata.table一起使用,但我可能需要分几步完成。

这就是我的数据的外观:

email   Action  ActionType  Date
wwww    Company won         1/17/14
wwww    Company trial       1/22/14
wwww    Event   Meeting     1/24/14
wwww    Event   Meeting     2/24/14
wwww    Gmail   Email       9/10/14
wwww    Company won         9/11/14
wwww    Company won         9/25/14
wwww    Event   Support     10/7/14
wwww    Company won         10/22/14
wwww    Company won         12/31/14
wwww    Gmail   Email       2/13/15
wwww    Gmail   Email       2/27/15
wwww    Gmail   Email       3/6/15
wwww    Gmail   Email       3/26/15
wwww    Gmail   Email       4/20/15
wwww    Gmail   Email       4/24/15
wwww    Gmail   Email       5/13/15
xxxx    Company trial       1/17/14
xxxx    Gmail   Email       1/22/14
xxxx    Event   Meeting     1/24/14
xxxx    Company won         2/24/14
xxxx    Gmail   Email       9/10/14
xxxx    Gmail   Email       9/11/14
xxxx    Gmail   Email       9/25/14
xxxx    Gmail   Email       10/7/14
xxxx    Gmail   Email       10/22/14
yyyy    Company won         1/24/14
yyyy    Company trial       2/24/14
yyyy    Task    Call        9/10/14
yyyy    Task    Call        9/11/14
yyyy    Task    Call        9/25/14
yyyy    Company won         10/7/14
yyyy    Gmail   Email       10/22/14
yyyy    Gmail   Email       12/31/14
zzzz    Company won         9/11/14
zzzz    Company won         9/25/14
zzzz    Task    Call        10/7/14
zzzz    Task    Call        10/22/14
zzzz    Company trial       12/31/14
zzzz    Gmail   Email       2/13/15
zzzz    Company won         2/27/15
zzzz    Gmail   Email       3/6/15
zzzz    Gmail   Email       3/26/15

所以我希望最终结果看起来像这样。

email   Action  ActionType  Date
wwww    Company trial       1/22/14
wwww    Event   Meeting     1/24/14
wwww    Event   Meeting     2/24/14
wwww    Gmail   Email       9/10/14
wwww    Company won         9/11/14
wwww    Event   Support     10/7/14
wwww    Company won         10/22/14
xxxx    Company trial       1/17/14
xxxx    Gmail   Email       1/22/14
xxxx    Event   Meeting     1/24/14
xxxx    Company won         2/24/14
yyyy    Company trial       2/24/14
yyyy    Task    Call        9/10/14
yyyy    Task    Call        9/11/14
yyyy    Task    Call        9/25/14
yyyy    Company won         10/7/14
zzzz    Task    Call        10/7/14
zzzz    Task    Call        10/22/14
zzzz    Company trial       12/31/14
zzzz    Gmail   Email       2/13/15
zzzz    Company won         2/27/15

1 个答案:

答案 0 :(得分:2)

这是一种方式:

library(data.table)

# cut off leading wins and trailing nonwins
goodi = DT[, .I[
    rev(cumsum(rev(ActionType=="won"))) > 0L &
    cumsum(ActionType!="won") > 0L
], by=email]$V1

# take the first win when there's a succession of 'em
DT[goodi, r := rleid(ActionType=="won"), by=email]
badi = DT[!is.na(r), .I[ ActionType=="won" & 1:.N > 1], by=.(email,r)]$V1
DT[, r := NULL]

DT[setdiff(goodi,badi)]

给出了所需的输出

    email  Action ActionType     Date
 1:  wwww Company      trial  1/22/14
 2:  wwww   Event    Meeting  1/24/14
 3:  wwww   Event    Meeting  2/24/14
 4:  wwww   Gmail      Email  9/10/14
 5:  wwww Company        won  9/11/14
 6:  wwww   Event    Support  10/7/14
 7:  wwww Company        won 10/22/14
 8:  xxxx Company      trial  1/17/14
 9:  xxxx   Gmail      Email  1/22/14
10:  xxxx   Event    Meeting  1/24/14
11:  xxxx Company        won  2/24/14
12:  yyyy Company      trial  2/24/14
13:  yyyy    Task       Call  9/10/14
14:  yyyy    Task       Call  9/11/14
15:  yyyy    Task       Call  9/25/14
16:  yyyy Company        won  10/7/14
17:  zzzz    Task       Call  10/7/14
18:  zzzz    Task       Call 10/22/14
19:  zzzz Company      trial 12/31/14
20:  zzzz   Gmail      Email  2/13/15
21:  zzzz Company        won  2/27/15
    email  Action ActionType     Date