Sparklyr错误:'na.omit'没有删除任何行呼叫

时间:2017-06-08 15:10:57

标签: r hadoop cloudera apache-spark-mllib sparklyr

当我尝试将ml_decision_tree或ml_logistic_regresion与Sparklyr包一起使用时,我收到以下错误。我在cloudera集群上使用spark2.1.0。

> No rows dropped by 'na.omit' call.  Error in
> stop(simpleError(sprintf(fmt, ...), if (call.)
> sys.call(sys.parent()))) :   bad error message
下面的

是我运行的代码的片段:

at<-data_select

for (col in c(colnames(data_select)))
 {
data_ft <-at%>%
        ft_string_indexer(input.col =col, output.col = paste0(col,"_in"))%>%
        ft_one_hot_encoder(input.col = paste0(col,"_in"), output.col = paste0(col,'_ohe'))
        at <-data_ft
}

# create the features vectors
data_col<-colnames(data_ft) # get the colnames names of the aiom_ft table
gp<- grep("*ohe", data_col) # select only columns ended with ohe
features <-c(data_col[gp]) # get the names of those columns
features<-features[features!="target_ohe"] # remove the target variables from the features columns

# create the feactures vectors
data_feac<- ft_vector_assembler(data_ft, input.col = features , output.col ='FeacturesVectors' )

# create a aprtition
#parition the table
partitions<- data_feac %>%
             sdf_partition(training = 0.6, test = 0.4, seed = 10099)
    fit.dec<-partitions$training %>%
        ml_decision_tree(data_feac, response= 'target_ohe', features='FeacturesVectors', type ="classification", ml_options(na.action = getOption("na.action", "na.pass")))

我已尝试过ml_options(na.action = getOption(&#34; na.action&#34;,&#34; na.pass&#34;))和ml_options(na.action = getOption(&) #34; na.action&#34;,&#34; na.omit&#34;))和相同的错误消息

1 个答案:

答案 0 :(得分:0)

这是一个与sparklyr的公开问题。在GitHub上查看此Actual output