火花python如何使用stopwordsremover

时间:2018-12-30 13:55:30

标签: python-3.x apache-spark

我收到以下错误消息:

from pyspark.ml.feature import StopWordsRemover
remover = StopWordsRemover(inputCol="summary", outputCol="filtered")
remover.transform(dfcrashes_red).show(truncate=False)

IllegalArgumentException: 'requirement failed: Input type must be ArrayType(StringType) but got StringType.'

我的数据如下:

sqlContext.sql('select * from crashestablered').show(5) 

(1) Spark Jobs
+-----------------+---------+----+--------------------+
|             date|crashyear| dek|             summary|
+-----------------+---------+----+--------------------+
| January 06, 1960|     1960|1960|The plane disinte...|
| January 18, 1960|     1960|1960|The aircraft cras...|

如何从“摘要”列中删除停用词? 非常感谢您提前提供的所有帮助。

0 个答案:

没有答案