我收到以下错误消息:
from pyspark.ml.feature import StopWordsRemover
remover = StopWordsRemover(inputCol="summary", outputCol="filtered")
remover.transform(dfcrashes_red).show(truncate=False)
IllegalArgumentException: 'requirement failed: Input type must be ArrayType(StringType) but got StringType.'
我的数据如下:
sqlContext.sql('select * from crashestablered').show(5)
(1) Spark Jobs
+-----------------+---------+----+--------------------+
| date|crashyear| dek| summary|
+-----------------+---------+----+--------------------+
| January 06, 1960| 1960|1960|The plane disinte...|
| January 18, 1960| 1960|1960|The aircraft cras...|
如何从“摘要”列中删除停用词? 非常感谢您提前提供的所有帮助。