我有一个名为people.json的数据集
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
以下代码为我提供了arrayOutOfBoundsException。
import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
.master("local")
.appName("my-spark-app")
.config("spark.some.config.option", "config-value")
.getOrCreate()
val peopleDF = sparkSession.sparkContext.
textFile("C:/Users/Desktop/Spark/people.json").
map(_.split(",")).
map(attributes => Person(attributes(0),attributes(1).trim.toInt)).
toDF()
peopleDF.createOrReplaceTempView("person")
val teenagersDF = sparkSession.sql("select name, age FROM person")
teenagersDF.show()
看起来我正在尝试处理空数据帧。谁能告诉我为什么这是空的?
答案 0 :(得分:0)
如果您拥有有效的json
文件,则应使用sqlContext
将json
文件读入dataframe
。
import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
.master("local")
.appName("my-spark-app")
.config("spark.some.config.option", "config-value")
.getOrCreate()
val peopleDF = sparkSession.sqlContext.read.json("C:/Users/Desktop/Spark/people.json")
peopleDF.createOrReplaceTempView("person")
val teenagersDF = sparkSession.sql("select name, age FROM person")
teenagersDF.show()
这应该有效