所以这是我几天前提出的一个问题的延续。原始数据如下所示:
E_Add Action ActionType Call Callback Email
xxxx Task Call 1 0 0
xxxx Task Call 1 0 0
xxxx Event Start 0 0 0
xxxx Task Call 1 0 0
xxxx Event Trial 0 0 0
yyyy Task Call 1 0 0
yyyy Task Callback 0 1 0
yyyy Task Email 0 0 1
yyyy Task Call 1 0 0
yyyy Event Start 0 0 0
这就是我希望数据显示的方式:
Email Action ActionType Call Callback Emails CallSum CallBackSum EmailSum
1 xxxx Task Call 1 0 0
2 xxxx Task Call 1 0 0
3 xxxx Event Start 0 0 0 2 0 0
4 xxxx Task Call 1 0 0
5 xxxx Event Trial 0 0 0 1 0 0
6 yyyy Task Call 1 0 0
7 yyyy Task Callback 0 1 0
8 yyyy Task Email 0 0 1
9 yyyy Task Call 1 0 0
10 yyyy Event Start 0 0 0 2 1 1
这是生成原始数据集的一些代码。 这不是我自己的代码;我没写过。
df = structure(list(Email = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("xxxx", "yyyy"), class = "factor"), Action =
structure(c(2L,
2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("Event", "Task"
), class = "factor"), ActionType = structure(c(1L, 1L, 4L, 1L,
5L, 1L, 2L, 3L, 1L, 4L), .Label = c("Call", "Callback", "Email",
"Start", "Trial"), class = "factor"), Call = c(1L, 1L, 0L, 1L,
0L, 1L, 0L, 0L, 1L, 0L), Callback = c(0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L), Emails = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L)), .Names = c("Email", "Action", "ActionType", "Call", "Callback",
"Emails"), class = "data.frame", row.names = c(NA, -10L))
df$CallSum=''
df$CallBackSum=''
df$EmailSum=''
CSum =0
CBSum =0
ESum =0
for(i in 1:nrow(df)){
CSum = CSum+ df[[4]][i]
CBSum = CBSum+ df[[5]][i]
ESum = ESum+ df[[6]][i]
if(df[[2]][i] == 'Event'){
#
df[[7]][i] = CSum
df[[8]][i] = CBSum
df[[9]][i] = ESum
#clear out vars
CSum =0
CBSum =0
ESum =0
}
}
Email Action ActionType Call Callback Emails CallSum CallBackSum EmailSum
1 xxxx Task Call 1 0 0
2 xxxx Task Call 1 0 0
3 xxxx Event Start 0 0 0 2 0 0
4 xxxx Task Call 1 0 0
5 xxxx Event Trial 0 0 0 1 0 0
6 yyyy Task Call 1 0 0
7 yyyy Task Callback 0 1 0
8 yyyy Task Email 0 0 1
9 yyyy Task Call 1 0 0
10 yyyy Event Start 0 0 0 2 1 1
基本上代码执行我想要的操作,但在我的大型数据集中,它无法正常工作。我需要知道一种方法将此代码包装到一个函数中并应用于相同电子邮件地址的块(如通过电子邮件地址申请)。就像现在一样,代码将对呼叫,电子邮件和回调进行求和,然后重置"求和"每次它到达一个事件。每次到达新的电子邮件地址时我都需要重置它。我希望它遍历电子邮件地址,点击事件,然后计算事件之前的调用,回调和电子邮件,并将这些总和放在新列中。
还可能存在事件发生的情况,之后会有一些活动,然后是另一个事件。所以我需要在第二个事件之前计算活动,因为在该电子邮件地址的第一个事件之前没有呼叫,电子邮件或回调。
我试过这个:
e_all_copy$CallSum=''
e_all_copy$CallBackSum=''
e_all_copy$EmailSum=''
get.total <- function(x) { for(i in 1:nrow(x)) {
CSum =0
CBSum =0
ESum =0
CSum = CSum+ x[[4]][i]
CBSum = CBSum+ x[[5]][i]
ESum = ESum+ x[[6]][i]
if(x[[2]][i] == 'Event'){
#
x[[7]][i] = CSum
x[[8]][i] = CBSum
x[[9]][i] = ESum
#clear out vars
CSum =0
CBSum =0
ESum =0
}
CSum
CBSum
ESum
}}
e_all_summed <- ddply(e_all_copy,.(email),get.total)
但是输出只是所有唯一电子邮件地址的列表,旁边有NULL。任何帮助是极大的赞赏!谢谢!