Spark ML - 不支持StringType

时间:2017-11-24 11:32:55

标签: apache-spark machine-learning apache-spark-ml

示例代码:

import org.apache.spark.sql.SparkSession
import org.apache.log4j._

Logger.getLogger("org").setLevel(Level.ERROR)

val spark = SparkSession.builder().getOrCreate()

import org.apache.spark.ml.clustering.KMeans

val dataset = spark.read.option("header","true").option("inferSchema","true").csv("Online_Retail.csv")

val feature_data = dataset.select($"InvoiceNo", $"StockCode", $"CustomerID")

import org.apache.spark.ml.feature.{VectorAssembler,StringIndexer,VectorIndexer,OneHotEncoder}
import org.apache.spark.ml.linalg.Vectors

val assembler = new VectorAssembler().setInputCols(Array("InvoiceNo", "StockCode", "CustomerID")).setOutputCol("features")

val training_data = assembler.transform(feature_data).select("features")

运行代码时,会生成以下错误:

java.lang.IllegalArgumentException: Data type StringType is not supported

任何人都知道如何解决此错误?

当我尝试使用StringIndexer时,会触发以下错误:

scala> val invoiceNoIndexer = new StringIndexer().setInputCols("InvoiceNo").setOutputCol("invoiceIndexer")
<console>:30: error: value setInputCols is not a member of org.apache.spark.ml.feature.StringIndexer
val invoiceNoIndexer = new StringIndexer().setInputCols("InvoiceNo").setOutputCol("invoiceIndexer")
            ^

0 个答案:

没有答案