我有一个如下所示的数据集:
User<- c("User1", "User1","User1", "User1","User1", "User1","User1", "User2","User2","User2","User2","User2","User2","User2")
Touchpoints <- c("A", "B", "C", "F", "D", "E", "H","A", "B", "K", "D", "E", "F", "M")
Conversion <- c(0,0,0,1,0,0,1,0,0,1,1,0,0,1)
Frequency<-c(1,2,3,0,4,5,0,1,2,0,0,3,4,5)
df<-data.frame(User, Touchpoints, Conversion, Frequency)
df$Exponential<-ifelse(df$Frequency>0, exp(df$Frequency), 0)
df
User Touchpoints Conversion Frequency Exponential
1 User1 A 0 1 2.718282
2 User1 B 0 2 7.389056
3 User1 C 0 3 20.085537
4 User1 F 1 0 0.000000
5 User1 D 0 4 54.598150
6 User1 E 0 5 148.413159
7 User1 H 1 0 0.000000
8 User2 A 0 1 2.718282
9 User2 B 0 2 7.389056
10 User2 K 1 0 0.000000
11 User2 D 1 0 0.000000
12 User2 E 0 3 20.085537
13 User2 F 0 4 54.598150
14 User2 M 1 5 148.413159
以下是我要做的事情:
我想将Exponential
所代表的_Conv
值的百分比从Exponential
列的总和加User
到Conversion
值。这是一个例子:
User Touchpoints Conversion Frequency Exponential Sum of Exp 1st_Conv Sum_Exp_for_Conv2 2nd_Conv
1 User1 A 0 1 2.718282 30.192 0.0900 233.204 0.0116
2 User1 B 0 2 7.389056 30.192 0.2447 233.204 0.0317
3 User1 C 0 3 20.085537 30.192 0.6652 233.204 0.0861
4 User1 F 1 0 0.000000 0 0.0000 233.204 0
5 User1 D 0 4 54.598150 0 0.0000 233.204 0.2341
6 User1 E 0 5 148.413159 0 0.0000 233.204 0.6364
7 User1 H 1 0 0.000000 0 0.0000 0 0
8 User2 A 0 1 2.718282 10.107 0.2689 10.107 0.2689
9 User2 B 0 2 7.389056 10.107 0.7311 10.107 0.7311
10 User2 K 1 0 0.000000 0 0.0000 0 0
11 User2 D 1 0 0.000000 0 0.0000 0 0
12 User2 E 0 3 20.085537 0 0.0000 0 0
13 User2 F 0 4 54.598150 0 0.0000 0 0
14 User2 M 0 5 148.413159 0 0.0000 0 0
有些情况下,每个用户将有超过100次转换,并且通过这种方式创建数千列,似乎无法扩展。
我的最终输出是将所有_Conv
加到一个名为Final_Conv
的最后一列中。对于此示例,最终输出将如下所示:
User Touchpoints Conversion Frequency Final_Conv
1 User1 A 0 1 0.1017
2 User1 B 0 2 0.2764
3 User1 C 0 3 0.7514
4 User1 F 1 0 0
5 User1 D 0 4 0.2341
6 User1 E 0 5 0.6364
7 User1 H 1 0 0
8 User2 A 0 1 0.5379
9 User2 B 0 2 1.4621
10 User2 K 1 0 0
11 User2 D 1 0 0
12 User2 E 0 3 0
13 User2 F 0 4 0
14 User2 M 0 5 0
任何帮助都会很棒,谢谢!
答案 0 :(得分:1)
可能不是最简单的代码,但我们可以执行以下操作:
library(dplyr)
library(tidyr)
df %>%
group_by(User) %>%
mutate(row_id = row_number(),
conv_id = cumsum(Conversion),
exp_cumsum = cumsum(Exponential)) %>%
group_by(conv_id, add = TRUE) %>%
mutate(sum_of_exp = ifelse(n()==1, NA, last(exp_cumsum))) %>%
spread(conv_id, sum_of_exp, sep = "_") %>%
arrange(User, row_id) %>%
fill(!!!vars(starts_with("conv_id")), .direction = "up") %>%
mutate_at(vars(starts_with("conv_id")), funs(Exponential/.)) %>%
ungroup() %>%
mutate(Final_Conv = rowSums(.[-(1:7)], na.rm = TRUE)) %>%
select(1:4, Final_Conv)
备注:强>
我首先创建了Conversion
和Exponential
的累积总和,添加了conv_id
作为额外的分组变量,并替换了每个User
+ {{1}中的所有值与conv_id
的最后一个值组合。然后,展开exp_cumsum
和conv_id
列并向上填充每个sum_of_exp
列。最后,使用conv_id_
将每个mutate_at
列划分为Exponential
,并通过将所有生成的conv_id_
列与Final_Conv
相加来创建conv_id_
。
对于每个rowSums
,此解决方案适用于任意数量的Conversion
。
<强>结果:强>
User