Question

我正在处理工作中的归因建模问题，并且在格式化数据时遇到了问题。我使用以下包：

install.packages("ChannelAttribution")
library(ChannelAttribution)

要使用此包中的函数，我需要特定格式的数据：示例：

GooglePaid>Direct>GoogleOrganic>BingPaid>Converted

我想要的输出如下：

path                                               total_conversions          
Direct>GooglePaid>Converted                           504
GoogleOrganic>Direct>Direct>Direct                    689
YahooPaid>Converted                                   1,900
GoogleOrganic>BingPaid>Direct>Converted               785

总转化次数是某人选择该唯一路径的次数之和。因此，在上面的示例中，Direct＆gt; GooglePaid＆gt; Converted在数据集中被观察了504次。

然而

以下是我目前的开发团队数据格式：

 custID   custChannel            custDate
1  151        Direct        2015-10-10 00:15:32
2  151    GooglePaid        2015-10-10 00:16:45
3  151     Converted        2015-10-10 00:17:01
4  5655      BingPaid       2015-10-11 00:20:12
5  7855 GoogleOrganic       2015-10-12 00:05:32
6  7862  YahooOrganic       2015-10-13 00:18:20
7  9655    GooglePaid       2015-10-13 00:08:35
8  9655    GooglePaid       2015-10-13 00:11:11
9  9655     Converted       2015-10-13 00:11:35

在上面的数据中，每个唯一路径应该总和为1，因为只有一个记录路径，但是如果我们添加这个custID：

custID   custChannel            custDate
1  9666    GooglePaid        2015-10-14 00:15:32
2  9666    GooglePaid        2015-10-14 00:16:45
3  9666     Converted        2015-10-14 00:17:01

它会将GooglePaid＆gt; GooglePaid＆gt;转换为总数为2。

谢谢！

Answer 1

（＆＃34;由＆＃34提供;或＃34;需要作为输入 - ＆＃34;该包？听起来像后者。）

听起来你想要1）使用＆＃34;粘贴custChannel的顺序值。＆GT; ＆＃34;作为分隔符，2）也计算它们，这两个动作都在CustID的不同值内。无法确定＆＃34; total_conversion_value＆＃34;的基础是什么？或＆＃34; total_null＆＃34;。

我添加了一个custTime变量，尽管您可能有固定的宽度格式或制表符分隔的输入格式。（考虑到第二列中有趣的偏移，我会担心空格，因为我怀疑你没有使用空格作为隐式分隔符的stringsAsFactors=FALSE）

dat <- read.table(text=" custID   custChannel            custDate custTime
1  151        Direct        2015-10-10 00:15:32
2  151    GooglePaid        2015-10-10 00:16:45
3  151     Converted        2015-10-10 00:17:01
4  5655      BingPaid       2015-10-11 00:20:12
5  7855 GoogleOrganic       2015-10-12 00:05:32
6  7862  YahooOrganic       2015-10-13 00:18:20
7  9655    GooglePaid       2015-10-13 00:08:35
8  9655    GooglePaid       2015-10-13 00:11:11
9  9655     Converted       2015-10-13 00:11:35", header=TRUE, 
stringsAsFactors=FALSE)

如果您未按照客户数据和时间从开发团队获取此信息，那么可以使用dat <- with(dat, dat[ order(custID, custDate, custTime), ] )

完成

dat2 <-  lapply(unique(dat$custID), 
  function(x)  list( path= paste(dat[dat$custID==x,'custChannel'],  sep="", collapse=">"), 
                     total_conversions= length( dat[dat$custID==x,'custChannel']) ) )

do.call('rbind', dat2)
#--------------
     path                              total_conversions
[1,] "Direct>GooglePaid>Converted"     3                
[2,] "BingPaid"                        1                
[3,] "GoogleOrganic"                   1                
[4,] "YahooOrganic"                    1                
[5,] "GooglePaid>GooglePaid>Converted" 3

R中归因建模的格式数据

1 个答案: