我编写了这段代码,并在带有spark-submit的集群中运行:SUCCESS
当我在任务栏(QUE)中执行此操作时
[Create Job][https://yadi.sk/i/GyOIRRsv_jVs-Q]
我捕获到错误NoSuchTableException:
2018-12-04 12:48:19,244 [main] WARN org.apache.hadoop.hive.metastore.ObjectStore - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.15.1
2018-12-04 12:48:19,408 [main] WARN org.apache.hadoop.hive.metastore.ObjectStore - Failed to get database default, returning NoSuchObjectException
Oozie Launcher失败,主类[org.apache.oozie.action.hadoop.SparkMain],main()抛出异常,null org.apache.spark.sql.catalyst.analysis.NoSuchTableException
我认为是因为没有必要连接到Hive MetaStore。
然后我添加了:
hiveContext.setConf("hive.metastore.uris", "thrift://***.***.***.***:9083")
仍然,发现错误...
我使用:
代码
package ShowCase
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{SQLContext, SaveMode}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.hive.HiveContext
object TotalCalls {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Spark Program")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.metastore.uris", "thrift://***.***.***.***:9083")
val source = hiveContext.table("cdr.subs_cdr_nrm")
import hiveContext.implicits._
val result = {
source.filter("call_type = 1 or call_type = 2 or call_type = 29 or call_type = 43")
.select(from_unixtime(((floor(unix_timestamp($"start_time") / 900)) * 900), "yyyy-MM-dd HH:mm:ss").as("interval"), $"cellid", $"duration")
.groupBy("interval", "cellid")
.agg(count("*").as("total_calls"))
}
result.write.mode(SaveMode.Overwrite).saveAsTable("cdr.call_result")
sc.stop()
}
}