应用错误收集

OneHotEncoder，输入为数组

时间：2018-07-18 17:17:04

标签： scala apache-spark

我正在尝试从原始数据中提取功能。

我的原始数据是Seq[String]。

我想将其转换为使用多个OneHot而不是仅一个1的{{1}}编码，但是似乎spark ml https://spark.apache.org/docs/latest/ml-features.html#onehotencoderestimator只接受一个String输入。

也许我是盲人，但是我似乎找不到能接受字符串列表的人。

谢谢。

1 个答案:

答案 0 :(得分：0)

谢谢@ user8371915

阅读 https://spark.apache.org/docs/2.2.0/ml-features.html#countvectorizer 似乎正是我所需要的。

更多信息： How to get word details from TF Vector RDD in Spark ML Lib?