示例代码:
import org.apache.spark.sql.SparkSession
import org.apache.log4j._
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder().getOrCreate()
import org.apache.spark.ml.clustering.KMeans
val dataset = spark.read.option("header","true").option("inferSchema","true").csv("Online_Retail.csv")
val feature_data = dataset.select($"InvoiceNo", $"StockCode", $"CustomerID")
import org.apache.spark.ml.feature.{VectorAssembler,StringIndexer,VectorIndexer,OneHotEncoder}
import org.apache.spark.ml.linalg.Vectors
val assembler = new VectorAssembler().setInputCols(Array("InvoiceNo", "StockCode", "CustomerID")).setOutputCol("features")
val training_data = assembler.transform(feature_data).select("features")
运行代码时,会生成以下错误:
java.lang.IllegalArgumentException: Data type StringType is not supported
任何人都知道如何解决此错误?
当我尝试使用StringIndexer时,会触发以下错误:
scala> val invoiceNoIndexer = new StringIndexer().setInputCols("InvoiceNo").setOutputCol("invoiceIndexer")
<console>:30: error: value setInputCols is not a member of org.apache.spark.ml.feature.StringIndexer
val invoiceNoIndexer = new StringIndexer().setInputCols("InvoiceNo").setOutputCol("invoiceIndexer")
^