对于Sparklyr来说我还很陌生,我正尝试使用spark_apply在Spark数据帧上使用CRAN包(ChannelAttribution)中的函数。我使用spark_apply获得的输出与在内存数据帧中正常使用该函数的输出不同。
library(sparklyr)
library(dplyr)
library(tibble)
library(ChannelAttribution)
sc <- spark_connect(master = "local")
# Define some sample paths which lead to conversion.
my_paths <- tibble(path = c("A > B > C",
"A > A",
"C > B > C",
"B > A > B > B"),
conversion = 1)
# Calculate markov conversion values normally.
ChannelAttribution::markov_model(my_paths,
var_path = "path",
var_conv = "conversion",
order = 3)
# Copy to a Spark DataFrame without repartitioning, and use spark_apply to
# calculate the markov conversion values.
my_paths %>%
sdf_copy_to(sc, ., "my_paths", repartition = 1) %>%
spark_apply(function(df) {
ChannelAttribution::markov_model(df,
var_path = "path",
var_conv = "conversion",
order = 3)
}) %>%
collect()
第一个输出是
channel_name total_conversions
A 1.5011965
B 1.4990816
C 0.9997219
spark_apply输出为
channel_name total_conversions
A 1.33
B 1.33
C 1.33
对于这种情况的任何见解将不胜感激。