计算R中ID的另一列中特定值之前的行的列总和

时间:2015-03-25 13:44:08

标签: r function for-loop

所以这是我几天前提出的一个问题的延续。原始数据如下所示:

E_Add  Action  ActionType  Call  Callback  Email
xxxx   Task    Call        1     0         0
xxxx   Task    Call        1     0         0
xxxx   Event   Start       0     0         0
xxxx   Task    Call        1     0         0
xxxx   Event   Trial       0     0         0
yyyy   Task    Call        1     0         0
yyyy   Task    Callback    0     1         0
yyyy   Task    Email       0     0         1
yyyy   Task    Call        1     0         0
yyyy   Event   Start       0     0         0    

这就是我希望数据显示的方式:

   Email Action ActionType Call Callback Emails CallSum CallBackSum EmailSum
1   xxxx   Task       Call    1        0      0                             
2   xxxx   Task       Call    1        0      0                             
3   xxxx  Event      Start    0        0      0       2           0        0
4   xxxx   Task       Call    1        0      0                             
5   xxxx  Event      Trial    0        0      0       1           0        0
6   yyyy   Task       Call    1        0      0                             
7   yyyy   Task   Callback    0        1      0                             
8   yyyy   Task      Email    0        0      1                             
9   yyyy   Task       Call    1        0      0                             
10  yyyy  Event      Start    0        0      0       2           1        1

这是生成原始数据集的一些代码。 这不是我自己的代码;我没写过。

df = structure(list(Email = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("xxxx", "yyyy"), class = "factor"), Action =     
structure(c(2L, 
2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("Event", "Task"
), class = "factor"), ActionType = structure(c(1L, 1L, 4L, 1L, 
5L, 1L, 2L, 3L, 1L, 4L), .Label = c("Call", "Callback", "Email", 
"Start", "Trial"), class = "factor"), Call = c(1L, 1L, 0L, 1L, 
0L, 1L, 0L, 0L, 1L, 0L), Callback = c(0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L), Emails = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
0L)), .Names = c("Email", "Action", "ActionType", "Call", "Callback", 
"Emails"), class = "data.frame", row.names = c(NA, -10L))

df$CallSum=''
df$CallBackSum=''
df$EmailSum=''

CSum =0
CBSum =0
ESum =0
for(i in 1:nrow(df)){

CSum = CSum+ df[[4]][i]
CBSum = CBSum+ df[[5]][i]
ESum = ESum+ df[[6]][i]

if(df[[2]][i] == 'Event'){

#
df[[7]][i] = CSum
df[[8]][i] = CBSum
df[[9]][i] = ESum

#clear out vars
CSum =0
CBSum =0
ESum =0
}

}



   Email Action ActionType Call Callback Emails CallSum CallBackSum EmailSum
1   xxxx   Task       Call    1        0      0                             
2   xxxx   Task       Call    1        0      0                             
3   xxxx  Event      Start    0        0      0       2           0        0
4   xxxx   Task       Call    1        0      0                             
5   xxxx  Event      Trial    0        0      0       1           0        0
6   yyyy   Task       Call    1        0      0                             
7   yyyy   Task   Callback    0        1      0                             
8   yyyy   Task      Email    0        0      1                             
9   yyyy   Task       Call    1        0      0                             
10  yyyy  Event      Start    0        0      0       2           1        1

基本上代码执行我想要的操作,但在我的大型数据集中,它无法正常工作。我需要知道一种方法将此代码包装到一个函数中并应用于相同电子邮件地址的块(如通过电子邮件地址申请)。就像现在一样,代码将对呼叫,电子邮件和回调进行求和,然后重置"求和"每次它到达一个事件。每次到达新的电子邮件地址时我都需要重置它。我希望它遍历电子邮件地址,点击事件,然后计算事件之前的调用,回调和电子邮件,并将这些总和放在新列中。

还可能存在事件发生的情况,之后会有一些活动,然后是另一个事件。所以我需要在第二个事件之前计算活动,因为在该电子邮件地址的第一个事件之前没有呼叫,电子邮件或回调。

我试过这个:

e_all_copy$CallSum=''
e_all_copy$CallBackSum=''
e_all_copy$EmailSum=''


get.total <- function(x) { for(i in 1:nrow(x)) {

  CSum =0
  CBSum =0
  ESum =0

  CSum = CSum+ x[[4]][i]
  CBSum = CBSum+ x[[5]][i]
  ESum = ESum+ x[[6]][i]

  if(x[[2]][i] == 'Event'){

    #
    x[[7]][i] = CSum
    x[[8]][i] = CBSum
    x[[9]][i] = ESum

    #clear out vars
    CSum =0
    CBSum =0
    ESum =0
  }
  CSum
  CBSum
  ESum
}}


e_all_summed <- ddply(e_all_copy,.(email),get.total)

但是输出只是所有唯一电子邮件地址的列表,旁边有NULL。任何帮助是极大的赞赏!谢谢!

0 个答案:

没有答案