当我尝试将ml_decision_tree或ml_logistic_regresion与Sparklyr包一起使用时,我收到以下错误。我在cloudera集群上使用spark2.1.0。
> No rows dropped by 'na.omit' call. Error in
> stop(simpleError(sprintf(fmt, ...), if (call.)
> sys.call(sys.parent()))) : bad error message
下面的是我运行的代码的片段:
at<-data_select
for (col in c(colnames(data_select)))
{
data_ft <-at%>%
ft_string_indexer(input.col =col, output.col = paste0(col,"_in"))%>%
ft_one_hot_encoder(input.col = paste0(col,"_in"), output.col = paste0(col,'_ohe'))
at <-data_ft
}
# create the features vectors
data_col<-colnames(data_ft) # get the colnames names of the aiom_ft table
gp<- grep("*ohe", data_col) # select only columns ended with ohe
features <-c(data_col[gp]) # get the names of those columns
features<-features[features!="target_ohe"] # remove the target variables from the features columns
# create the feactures vectors
data_feac<- ft_vector_assembler(data_ft, input.col = features , output.col ='FeacturesVectors' )
# create a aprtition
#parition the table
partitions<- data_feac %>%
sdf_partition(training = 0.6, test = 0.4, seed = 10099)
fit.dec<-partitions$training %>%
ml_decision_tree(data_feac, response= 'target_ohe', features='FeacturesVectors', type ="classification", ml_options(na.action = getOption("na.action", "na.pass")))
我已尝试过ml_options(na.action = getOption(&#34; na.action&#34;,&#34; na.pass&#34;))和ml_options(na.action = getOption(&) #34; na.action&#34;,&#34; na.omit&#34;))和相同的错误消息