使用spark读取Multilple json模式

时间:2018-06-08 02:30:14

标签: json scala amazon-web-services hadoop2 amazon-emr

软件配置:

Hadoop distribution:Amazon 2.8.3
Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Spark 2.3.0

尝试使用多个json架构阅读

  

val df = spark.read.option(" mergeSchema",   "真&#34)上传.json(" S3A:// s3bucket / 2018/01/01 / *&#34)

引发错误,

org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.;
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:206)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:397)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)

如何使用带有spark?的multipl模式读取json?

1 个答案:

答案 0 :(得分:0)

当您指向错误的路径时(数据不存在时),有时会发生这种情况。