我的架构可以为空,但仍然给出ArrayIndexOutOfBoundsException:1

时间:2018-03-27 06:36:37

标签: scala apache-spark apache-spark-sql

架构如下:

root
 |-- reviewText: string (nullable = true)

选择要执行操作的行

val extracted_reviews = sql("select reviewText from book").collect

在这里加载了AFINN

val reviewSenti = extracted_reviews.map(reviewText => { val reviewWordsSentiment = reviewText(1).toString.split(" ").map(word => {
  var senti: Int = 0;
  if (AFINN.lookup(word.toLowerCase()).length > 0) {
    senti = AFINN.lookup(word.toLowerCase())(0)
  }
  senti
})
  val reviewSentiment = reviewWordsSentiment.sum
  (reviewSentiment ,reviewText.toString)
})

我已经在架构中将reviewText作为null,所以为什么会出现此错误:

java.lang.ArrayIndexOutOfBoundsException: 1
  at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:200)
  at org.apache.spark.sql.Row$class.apply(Row.scala:157)
  at 
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  ... 52 elided

1 个答案:

答案 0 :(得分:1)

collect()返回Array[Row]所以要从中获取值,您可以使用reviewText.getString(0)

val reviewSenti = extracted_reviews.map(reviewText => 
    val reviewWordsSentiment = reviewText.getString(0).split(" ").map(...)
)