如何在其他Azure Blob存储上使用外部Metastore创建/访问Hive表?

时间:2018-01-22 15:17:55

标签: azure hive hdinsight azure-data-factory azure-blob-storage

我希望在Hive中使用运行Azure HDInsight On Demand群集(3.6)的Azure数据工厂(v1)执行一些数据转换。

由于HDInsight On Demand集群在一些空闲时间后被销毁,我想/需要保留有关Hive表(例如分区)的元数据,我还使用Azure SQL Server数据库配置了外部Hive Metastore。

现在,我想将所有生产数据存储在一个单独的存储帐户上,而不是默认的#34;帐户,其中Data Factory和HDInsight还为日志记录和其他运行时数据创建容器。

所以我有以下资源:

  • 具有HDInsight On Demand的数据工厂(作为链接服务)
  • Hive Metastore的SQL Server和数据库(在HDInsight On Demand中配置)
  • Data Factory和HDInsight On Demand群集使用的默认存储帐户(blob存储,通用v1)
  • 数据入口和Hive表的附加存储帐户(blob存储,通用v1)

除位于North Europe的数据工厂外,所有资源都在同一位置West Europe,这应该没问题(HDInsight群集必须与任何存储帐户位于同一位置)想用)。所有与数据工厂相关的部署都是使用DataFactoryManagementClient API完成的。

示例Hive脚本(在Data Factory中部署为HiveActivity)如下所示:

CREATE TABLE IF NOT EXISTS example_table (
  deviceId string,
  createdAt timestamp,
  batteryVoltage double,
  hardwareVersion string,
  softwareVersion string,
)
PARTITIONED BY (year string, month string) -- year and month from createdAt
CLUSTERED BY (deviceId) INTO 256 BUCKETS
STORED AS ORC
LOCATION 'wasb://container@additionalstorage.blob.core.windows.net/example_table'
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE example_table PARTITIONS (year, month) VALUES ("device1", timestamp "2018-01-22 08:57:00", 2.7, "hw1.32.2", "sw0.12.3");

遵循文档herehere,这应该非常简单:只需将新存储帐户添加为其他链接服务(使用additionalLinkedServiceNames属性)。

但是,当Hive脚本试图访问存储在此帐户中的表时,会导致以下异常:

IllegalStateException Error getting FileSystem for wasb : org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.KeyProviderException: ExitCodeException exitCode=2: Error reading S/MIME message
139827842123416:error:0D06B08E:asn1 encoding routines:ASN1_D2I_READ_BIO:not enough data:a_d2i_fp.c:247:
139827842123416:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:192:
139827842123416:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:517:

Some googling told me发生这种情况时,密钥提供程序未正确配置(即抛出异常,因为它尝试解密密钥,即使它未加密)。手动将fs.azure.account.keyprovider.<storage_name>.blob.core.windows.net设置为org.apache.hadoop.fs.azure.SimpleKeyProvider后,它似乎适用于阅读和简单的&#34;将数据写入表,但在Metastore参与时再次失败(创建表,添加新分区,......):

ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4434)
  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:316)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
[...]
  at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38593)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38561)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:38487)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1103)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1089)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2203)
  at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:99)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:736)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:724)
  [...]
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
  at com.sun.proxy.$Proxy5.createTable(Unknown Source)
  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:777)
  ... 24 more

我再次尝试谷歌搜索,但没有找到可用的东西。我认为它可能需要做一些事实,即Metastore服务与Hive分开运行,并且由于某种原因无法访问配置的存储帐户密钥......但说实话,我认为这应该只是工作无需手动修改Hadoop / Hive配置。

所以,我的问题是:我做错了什么,这应该如何运作?

1 个答案:

答案 0 :(得分:0)

您需要确保将hadoop-azure.jar和azure-storage-5.4.0.jar添加到hadoop-env.sh中的Hadoop Classpath导出中。

export HADOOP_CLASSPATH = / usr / lib / hadoop-client / hadoop-azure.jar:/usr/lib/hadoop-client/lib/azure-storage-5.4.0.jar:$ HADOOP_CLASSPATH

您需要通过核心站点中的以下参数添加存储密钥。 fs.azure.account.key。{storageaccount} .blob.core.windows.net

创建数据库和表格时,您需要使用存储帐户和用户ID指定位置 创建表{Tablename} ... LOCATION&#39; wasbs:// {container} @ {storageaccount} .blob.core.windows.net / {filepath}&#39;

如果在尝试上述检查后仍有问题,请查看存储帐户是V1还是V2。我们遇到的问题是V2存储帐户无法与我们的HDP版本一起使用。