无法使用外部Hive Metastore配置数据块

时间:2020-03-24 05:26:39

标签: azure-databricks

我正在关注本文档https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options 连接到我的外部Hive Metastore。我的metastore版本是3.1.0,并且遵循该文档。

docs.databricks.comdocs.databricks.com 外部Apache Hive Metastore — Databricks文档 了解如何连接到Databricks中的外部Apache Hive元存储。 10:51 尝试连接到外部配置单元metastore时出现此错误

org/apache/hadoop/hive/conf/HiveConf when creating Hive client using classpath: 
Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars

spark.sql.hive.metastore.jars = / databricks / hive_metastore_jars / *

在/ databricks / hive_metastore_jars /上执行ls命令时,我可以看到所有复制的文件 10:52 我是否需要复制任何配置单元特定的文件并将其上传到此文件夹中?

我确实做了网站上提到的内容

这些是我的hive_metastore_jars的内容

total 56K
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 1585025573715-0
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 d596a6ec-e105-4a6e-af95-df3feffc263d_resources
drwxr-xr-x 3 root root 4.0K Mar 24 05:06 repl
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-2959157d-2018-441a-a7d3-d7cecb8a645f
drwxr-xr-x 4 root root 4.0K Mar 24 05:06 root
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-30a72ee5-304c-432b-9c13-0439511fb0cd
drwxr-xr-x 2 root root 4.0K Mar 24 05:06 spark-a19d167b-d571-4e58-a961-d7f6ced3d52f
-rwxr-xr-x 1 root root 5.5K Mar 24 05:06 _CleanRShell.r3763856699176668909resource.r
-rwxr-xr-x 1 root root 9.7K Mar 24 05:06 _dbutils.r9057087446822479911resource.r
-rwxr-xr-x 1 root root  301 Mar 24 05:06 _rServeScript.r1949348184439973964resource.r
-rwxr-xr-x 1 root root 1.5K Mar 24 05:06 _startR.sh5660449951005543051resource.r

我想念什么吗?

奇怪的是,如果我查看集群启动日志,这就是我得到的

20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionDriverName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionURL unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionUserName unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property spark.hadoop.javax.jdo.option.ConnectionPassword unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
20/03/24 07:29:05 INFO Persistence: Property datanucleus.schema.autoCreateAll unknown - will be ignored

20/03/24 07:29:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
20/03/24 07:29:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

我已经设置了上面的配置,它也显示在日志中

20/03/24 07:28:59 INFO SparkContext: Spark configuration:
spark.hadoop.javax.jdo.option.ConnectionDriverName=org.mariadb.jdbc.Driver
spark.hadoop.javax.jdo.option.ConnectionPassword=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionURL=*********(redacted)
spark.hadoop.javax.jdo.option.ConnectionUserName=*********(redacted)

版本信息也可以在我的配置单元metastore中找到,我可以连接到mysql并看到它显示 SCHEMA_VERSION:3.1.0 VER_ID = 1

2 个答案:

答案 0 :(得分:0)

从输出中看,罐子似乎没有复制到“ / databricks / hive_metastore_jars /”位置。如您共享的文档链接中所述:

  1. 将spark.sql.hive.metastore.jars设置为maven
  2. 使用上述配置重新启动集群,然后在Spark驱动程序日志中检查消息:
1 * messageRepository.saveAll(_) >> {
    List<Message> savedMessageList ->
    List<Message> messageList = savedMessageList.get(0) as List<Message>

    assert savedMessageList.size() == 0
}

从该位置将罐子从同一集群复制到DBFS,然后使用初始化脚本将罐子从DBFS复制到“ / databricks / hive_metastore_jars /”

答案 1 :(得分:0)