Spark 1.6:将DataFrame保存到Hive时出现异常

时间:2017-03-22 17:25:05

标签: scala apache-spark hive

我从hive-table加载了一个DataFrame。添加一些数据后,我想把它写回到hive,但得到这个例外:

Exception in thread "main" java.lang.RuntimeException: [1.5] failure: ``.'' expected but `:' found

以下是我的代码的一部分:

var _resultsDF: DataFrame = _hiveContext.read.format("orc").load (_masterHDFS + "/apps/hive/warehouse/mytable");

def getValueList(): List[Any] = {
  List(
    "entry field1"
    "entry field2"
  )
}

def addRow(): Unit = {
  val rdd     = _sparkContext.parallelize (Seq (getValueList()))
  val rowRdd  = rdd.map (value => Row ((value: _*)))
  val rowDF   = _sqlContext.createDataFrame (rowRdd, _resultsDF.schema)

  // put DataFrame to results
  _resultsDF = _resultsDF.unionAll (rowDF)

  _resultsDF.write.format("orc").saveAsTable (_masterHDFS + "/apps/hive/warehouse/mytable")

  // => Exception in thread "main" java.lang.RuntimeException: [1.5] failure: ``.'' expected but `:' found
}

1 个答案:

答案 0 :(得分:0)

saveAsTable似乎只允许table-identifier作为参数,因此必须为base-uri设置选项。这对我有用:

var options: Map[String, String] = Map()
options += ("path" -> (_masterHDFS + "/apps/hive/warehouse/"))
_resultsDF.write.format("orc").mode(SaveMode.Append).options(options).saveAsTable("mytable")

或者您只使用save而不是saveAsTable

_resultsDF.write.format("orc").mode(SaveMode.Append).save(_masterHDFS + "/apps/hive/warehouse/mytable")