在spark SQL中迭代数据框时出现ArrayOutOfBoundException

时间:2017-07-28 06:50:22

标签: apache-spark apache-spark-sql

我有一个名为people.json的数据集

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

以下代码为我提供了arrayOutOfBoundsException。

  import org.apache.spark.sql.SparkSession

  val sparkSession = SparkSession.builder
    .master("local")
    .appName("my-spark-app")
    .config("spark.some.config.option", "config-value")
    .getOrCreate()

  val peopleDF = sparkSession.sparkContext.
    textFile("C:/Users/Desktop/Spark/people.json").
    map(_.split(",")).
    map(attributes => Person(attributes(0),attributes(1).trim.toInt)).
    toDF()

  peopleDF.createOrReplaceTempView("person")

  val teenagersDF = sparkSession.sql("select name, age FROM person")

  teenagersDF.show()

看起来我正在尝试处理空数据帧。谁能告诉我为什么这是空的?

1 个答案:

答案 0 :(得分:0)

如果您拥有有效的json文件,则应使用sqlContextjson文件读入dataframe

 import org.apache.spark.sql.SparkSession

  val sparkSession = SparkSession.builder
    .master("local")
    .appName("my-spark-app")
    .config("spark.some.config.option", "config-value")
    .getOrCreate()

  val peopleDF = sparkSession.sqlContext.read.json("C:/Users/Desktop/Spark/people.json")

  peopleDF.createOrReplaceTempView("person")

  val teenagersDF = sparkSession.sql("select name, age FROM person")

  teenagersDF.show()

这应该有效