Question

我正在尝试使用scala性质从eclipse maven项目访问hive表。

我尝试使用hive上下文来获取hive数据库详细信息，如下所示，但面临下面的错误我可以在spark-shell CLI中执行以下代码，但无法在添加maven依赖项的eclipse scala ide中执行相同的操作。

以下是我的代码：

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._

object readHiveTable {
  def main(args: Array[String]){
    val conf = new SparkConf().setAppName("Read Hive Table").setMaster("local")
    conf.set("spark.ui.port","4041")
    val sc = new SparkContext(conf)
    //val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val hc = new HiveContext(sc)
    hc.setConf("hive.metastore.uris","thrift://127.0.0.1:9083")
    hc.sql("use default")
    val a = hc.sql("show tables")
    a.show
  }
}

以下是我在控制台窗口中遇到的错误：

18/02/04 19:58:15 INFO SparkUI: Started SparkUI at http://192.168.0.10:4041
18/02/04 19:58:15 INFO Executor: Starting executor ID driver on host localhost
18/02/04 19:58:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36099.
18/02/04 19:58:15 INFO NettyBlockTransferService: Server created on 36099
18/02/04 19:58:15 INFO BlockManagerMaster: Trying to register BlockManager
18/02/04 19:58:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36099 with 744.4 MB RAM, BlockManagerId(driver, localhost, 36099)
18/02/04 19:58:15 INFO BlockManagerMaster: Registered BlockManager
18/02/04 19:58:17 INFO HiveContext: Initializing execution hive, version 1.2.1
18/02/04 19:58:17 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
18/02/04 19:58:17 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
18/02/04 19:58:17 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
18/02/04 19:58:17 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
18/02/04 19:58:17 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
18/02/04 19:58:17 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
18/02/04 19:58:17 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
18/02/04 19:58:17 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/02/04 19:58:17 INFO ObjectStore: ObjectStore, initialize called
18/02/04 19:58:17 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/02/04 19:58:17 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/02/04 19:58:28 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:38 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/02/04 19:58:39 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/02/04 19:58:39 INFO ObjectStore: Initialized ObjectStore
18/02/04 19:58:40 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/02/04 19:58:40 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/02/04 19:58:41 INFO HiveMetaStore: Added admin role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: Added public role in metastore
18/02/04 19:58:41 INFO HiveMetaStore: No user is added in admin role, since config is empty
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_all_databases
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_all_databases   
18/02/04 19:58:41 INFO HiveMetaStore: 0: get_functions: db=default pat=*
18/02/04 19:58:41 INFO audit: ugi=chaithu   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
18/02/04 19:58:41 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
    at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
    at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
    at com.CITIGenesis.readHiveTable$.main(readHiveTable.scala:13)
    at com.CITIGenesis.readHiveTable.main(readHiveTable.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 12 more
18/02/04 19:58:43 INFO SparkContext: Invoking stop() from shutdown hook
18/02/04 19:58:43 INFO SparkUI: Stopped Spark web UI at http://192.168.0.10:4041
18/02/04 19:58:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/04 19:58:43 INFO MemoryStore: MemoryStore cleared
18/02/04 19:58:43 INFO BlockManager: BlockManager stopped
18/02/04 19:58:43 INFO BlockManagerMaster: BlockManagerMaster stopped
18/02/04 19:58:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/02/04 19:58:43 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
18/02/04 19:58:43 INFO SparkContext: Successfully stopped SparkContext
18/02/04 19:58:43 INFO ShutdownHookManager: Shutdown hook called
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ec5892a-1d53-4721-b770-d16e8757865d
18/02/04 19:58:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-0ca97c02-57c7-400b-b552-44f6d7813da5

HDFS目录：

chaithu@localhost:~$ hadoop fs -ls /tmp
Found 3 items
d---------   - hdfs   supergroup          0 2018-02-04 14:15 /tmp/.cloudera_health_monitoring_canary_files
drwxrwxrwx   - hdfs   supergroup          0 2018-01-31 11:42 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2018-01-31 11:25 /tmp/logs
chaithu@localhost:~$ hadoop fs -ls /user/
Found 6 items
drwxrwxrwx   - chaithu supergroup          0 2018-02-04 19:34 /user/chaithu
drwxrwxrwx   - mapred  hadoop              0 2018-01-31 11:25 /user/history
drwxrwxr-t   - hive    hive                0 2018-01-31 11:31 /user/hive
drwxrwxr-x   - hue     hue                 0 2018-01-31 11:38 /user/hue
drwxrwxr-x   - oozie   oozie               0 2018-01-31 11:34 /user/oozie
drwxr-x--x   - spark   spark               0 2018-01-31 22:39 /user/spark

Answer 1

for Hadoop version 2.2.0

假设它是Spark版本，您应该使用SparkSession并使用enableHiveSupport()，那么spark.sql方法将像spark shell一样运行。

HIve / SQLContext仅用于向后兼容。新的Spark代码不应该使用它们。

underlying DB is DERBY

对我而言，这一行意味着

Hive正在使用默认的Metastore配置
Spark未连接到Metastore，并创建了本地derby数据库。这相当于Failed to get database default

在后一种情况下，请检查本地文件系统的/ tmp文件夹

请参阅此处的各种解决方案，了解如何连接到Metastore

How to connect to a Hive metastore programmatically in SparkSQL?

无法使用maven项目从Eclipse通过HiveContext访问配置单元表

1 个答案: