计算条件为

时间:2018-01-18 09:19:34

标签: r

我想计算路径的最后一个非直接属性。输入数据框如下所示:

path = c("path1","path2","path3","path4","path5","path6","path7") 
c1 = c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web") 
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app") 
c3 = c("direct_app",NA,"direct_app",NA, NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
df_input <- data.frame(path,c1,c2,c3,c4,c5)

我想要做的就是添加一个新列,在该列中我应该拥有最后一个非直接值。     注意:direct可以是direct_web或direct_app

输出数据框如下所示:

path = c("path1","path2","path3","path4","path5","path6","path7") 
c1 =c("channel1","direct_app","direct","channel45","channel33","direct_web","direct_web") 
c2 = c("channel2",NA,"channel23",NA,"channel11","channel5", "direct_app") 
c3 = c("direct_app",NA,"direct_app",NA,NA,"direct_app",NA)
c4 = c(NA,NA,"direct_app",NA,NA,NA,NA)
c5 = c(NA,NA,"direct_web",NA,NA,NA,NA)
last_non_direct <- c("channel2","direct_app","channel23","channel45","channel11","channel5","direct_app")
df_output <- data.frame(path,c1,c2,c3,c4,c5,last_non_direct)

如果路径只包含direct(即direct_web / direct_app),那么它将采用最后一个直接路径。(如输出数据框所示) 如果路径中根本没有直接路径,则需要最后一个通道。

我使用for循环实现了这个,但由于我的数据非常大(我有100万个路径),所以需要花费近30分钟才能完成相同的操作。任何帮助唱dply r或类似的快速方法将非常感激。

1 个答案:

答案 0 :(得分:2)

使用基础R你可以做这样的事情......

#find last non-NA or direct
out1 <- apply(df_input,1,function(x) tail(x[!is.na(x) & !grepl("direct",x)],1))
#find last non-NA
out2 <- apply(df_input,1,function(x) tail(x[!is.na(x)],1))
#replace those with 'path' with last non-NA
out1[grepl("path",out1)] <- out2[grepl("path",out1)]

out1 
[1] "channel2"   "direct_app" "channel23"  "channel45"  "channel11"  "channel5"   "direct_app"