我正在尝试使用scala性质从eclipse maven项目访问hive表。
我尝试使用hive上下文来获取hive数据库详细信息,如下所示,但面临下面的错误 我可以在spark-shell CLI中执行以下代码,但无法在添加maven依赖项的eclipse scala ide中执行相同的操作。
以下是我的代码:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._
object readHiveTable {
def main(args: Array[String]){
val conf = new SparkConf().setAppName("Read Hive Table").setMaster("local")
conf.set("spark.ui.port","4041")
val sc = new SparkContext(conf)
//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val hc = new HiveContext(sc)
hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")
hc.sql("use default")
val a = hc.sql("show tables")
a.show
}
}
以下是我在控制台窗口中遇到的错误:
18/02/04 19:58:15 INFO SparkUI: Started SparkUI at http://192.168.0.10:4041
18/02/04 19:58:15 INFO Executor: Starting executor ID driver on host localhost
18/02/04 19:58:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36099.
18/02/04 19:58:15 INFO NettyBlockTransferService: Server created on 36099
18/02/04 19:58:15 INFO BlockManagerMaster: Trying to register BlockManager
18/02/04 19:58:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36099 with 744.4 MB RAM, BlockManagerId(driver, localhost, 36099)
18/02/04 19:58:15 INFO BlockManagerMaster: Registered BlockManager
18/02/04 19:58:17 INFO HiveContext: Initializing execution hive, version 1.2.1
18/02/04 19:58:17 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
18/02/04 19:58:17 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
18/02/04 19:58:17 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/02/04 19:58:17 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/02/04 19:58:17 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
18/02/04 19:58:17 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/02/04 19:58:17 INFO ObjectStore: ObjectStore, initialize called
18/02/04 19:58:17 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/02/04 19:58:17 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/02/04 19:58:28 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:39 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/02/04 19:58:39 INFO ObjectStore: Initialized ObjectStore
18/02/04 19:58:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/02/04 19:58:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/02/04 19:58:41 INFO HiveMetaStore: Added admin role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: Added public role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_all_databases
18/02/04 19:58:41 INFO audit: ugi=chaithu ip=unknown-ip-addr cmd=get_all_databases
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/02/04 19:58:41 INFO audit: ugi=chaithu ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/02/04 19:58:41 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.CITIGenesis.readHiveTable$.main(readHiveTable.scala:13)
at com.CITIGenesis.readHiveTable.main(readHiveTable.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 12 more
18/02/04 19:58:43 INFO SparkContext: Invoking stop() from shutdown hook
18/02/04 19:58:43 INFO SparkUI: Stopped Spark web UI at http://192.168.0.10:4041
18/02/04 19:58:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/04 19:58:43 INFO MemoryStore: MemoryStore cleared
18/02/04 19:58:43 INFO BlockManager: BlockManager stopped
18/02/04 19:58:43 INFO BlockManagerMaster: BlockManagerMaster stopped
18/02/04 19:58:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/02/04 19:58:43 INFO SparkContext: Successfully stopped SparkContext
18/02/04 19:58:43 INFO ShutdownHookManager: Shutdown hook called
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ec5892a-1d53-4721-b770-d16e8757865d
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ca97c02-57c7-400b-b552-44f6d7813da5
HDFS目录:
chaithu@localhost:~$ hadoop fs -ls /tmp
Found 3 items
d--------- - hdfs supergroup 0 2018-02-04 14:15 /tmp/.cloudera_health_monitoring_canary_files
drwxrwxrwx - hdfs supergroup 0 2018-01-31 11:42 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2018-01-31 11:25 /tmp/logs
chaithu@localhost:~$ hadoop fs -ls /user/
Found 6 items
drwxrwxrwx - chaithu supergroup 0 2018-02-04 19:34 /user/chaithu
drwxrwxrwx - mapred hadoop 0 2018-01-31 11:25 /user/history
drwxrwxr-t - hive hive 0 2018-01-31 11:31 /user/hive
drwxrwxr-x - hue hue 0 2018-01-31 11:38 /user/hue
drwxrwxr-x - oozie oozie 0 2018-01-31 11:34 /user/oozie
drwxr-x--x - spark spark 0 2018-01-31 22:39 /user/spark
答案 0 :(得分:0)
for Hadoop version 2.2.0
假设它是Spark版本,您应该使用SparkSession
并使用enableHiveSupport()
,那么spark.sql
方法将像spark shell一样运行。
HIve / SQLContext仅用于向后兼容。新的Spark代码不应该使用它们。
underlying DB is DERBY
对我而言,这一行意味着
Failed to get database default
在后一种情况下,请检查本地文件系统的/ tmp文件夹
请参阅此处的各种解决方案,了解如何连接到Metastore
How to connect to a Hive metastore programmatically in SparkSQL?