架构如下:
root
|-- reviewText: string (nullable = true)
选择要执行操作的行
val extracted_reviews = sql("select reviewText from book").collect
在这里加载了AFINN
val reviewSenti = extracted_reviews.map(reviewText => { val reviewWordsSentiment = reviewText(1).toString.split(" ").map(word => {
var senti: Int = 0;
if (AFINN.lookup(word.toLowerCase()).length > 0) {
senti = AFINN.lookup(word.toLowerCase())(0)
}
senti
})
val reviewSentiment = reviewWordsSentiment.sum
(reviewSentiment ,reviewText.toString)
})
我已经在架构中将reviewText
作为null,所以为什么会出现此错误:
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:200)
at org.apache.spark.sql.Row$class.apply(Row.scala:157)
at
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
... 52 elided
答案 0 :(得分:1)
collect()
返回Array[Row]
所以要从中获取值,您可以使用reviewText.getString(0)
val reviewSenti = extracted_reviews.map(reviewText =>
val reviewWordsSentiment = reviewText.getString(0).split(" ").map(...)
)