Spark SQL over Streaming - ArrayIndexOutOfBoundsException

时间:2016-03-23 16:07:01

标签: apache-spark apache-spark-sql spark-streaming spark-dataframe

我有以下代码在流式传输上启动SQL查询。我的问题是,其中一个结果显示一个ArrayIndexOutOfBoundsException。为什么会这样?

<el id='0' /> <!-- has priority -->
<el id='2' /> <!-- has priority -->
<el id='3' /> <!-- has priority -->

这是我得到的输出。在得到正确的结果后,我跳过错误:

import org.apache.spark._
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.sql.SQLContext
import org.apache.spark.streaming.Duration

import org.apache.spark.sql.functions.udf

object StreamingSQL {

   case class Persons(name: String, age: Int)

   def main(args: Array[String]) {

       val sparkConf = new SparkConf().setMaster("local").setAppName("HdfsWordCount")
       val sc = new SparkContext(sparkConf)
       // Create the context
       val ssc = new StreamingContext(sc, Seconds(2))

      val lines = ssc.textFileStream("/home/cloudera/Smartcare/stream/")
      lines.foreachRDD(rdd=>rdd.foreach(println))

      val sqc = new SQLContext(sc);
      //import sqc.createSchemaRDD
       import sqc.implicits._

     // Create the FileInputDStream on the directory and use the
     // stream to count words in new files created

      lines.foreachRDD{rdd=>
           val persons = rdd.map(_.split(",")).map(p => Persons(p(0), p(1).trim.toInt)).toDF()
           persons.registerTempTable("data")
           val teenagers = sqc.sql("SELECT name FROM data WHERE age >= 13 AND age <= 19")
           teenagers.foreach(println)
     }

    ssc.start()
    ssc.awaitTermination()
   }
}

我的文字是:

16/03/23 16:58:56 INFO GenerateUnsafeProjection: Code generated in 131.828141 ms
[Edgar]
16/03/23 16:58:56 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException: 1

1 个答案:

答案 0 :(得分:3)

这是因为Isabel50没有逗号。您的split(",")只返回该行的一个值,因此该行p(1)失败。