我希望在Hive中使用运行Azure HDInsight On Demand群集(3.6)的Azure数据工厂(v1)执行一些数据转换。
由于HDInsight On Demand集群在一些空闲时间后被销毁,我想/需要保留有关Hive表(例如分区)的元数据,我还使用Azure SQL Server数据库配置了外部Hive Metastore。
现在,我想将所有生产数据存储在一个单独的存储帐户上,而不是默认的#34;帐户,其中Data Factory和HDInsight还为日志记录和其他运行时数据创建容器。
所以我有以下资源:
除位于North Europe
的数据工厂外,所有资源都在同一位置West Europe
,这应该没问题(HDInsight群集必须与任何存储帐户位于同一位置)想用)。所有与数据工厂相关的部署都是使用DataFactoryManagementClient API完成的。
示例Hive脚本(在Data Factory中部署为HiveActivity)如下所示:
CREATE TABLE IF NOT EXISTS example_table (
deviceId string,
createdAt timestamp,
batteryVoltage double,
hardwareVersion string,
softwareVersion string,
)
PARTITIONED BY (year string, month string) -- year and month from createdAt
CLUSTERED BY (deviceId) INTO 256 BUCKETS
STORED AS ORC
LOCATION 'wasb://container@additionalstorage.blob.core.windows.net/example_table'
TBLPROPERTIES ('transactional'='true');
INSERT INTO TABLE example_table PARTITIONS (year, month) VALUES ("device1", timestamp "2018-01-22 08:57:00", 2.7, "hw1.32.2", "sw0.12.3");
遵循文档here和here,这应该非常简单:只需将新存储帐户添加为其他链接服务(使用additionalLinkedServiceNames
属性)。
但是,当Hive脚本试图访问存储在此帐户中的表时,会导致以下异常:
IllegalStateException Error getting FileSystem for wasb : org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.KeyProviderException: ExitCodeException exitCode=2: Error reading S/MIME message
139827842123416:error:0D06B08E:asn1 encoding routines:ASN1_D2I_READ_BIO:not enough data:a_d2i_fp.c:247:
139827842123416:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:192:
139827842123416:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:517:
Some googling told me发生这种情况时,密钥提供程序未正确配置(即抛出异常,因为它尝试解密密钥,即使它未加密)。手动将fs.azure.account.keyprovider.<storage_name>.blob.core.windows.net
设置为org.apache.hadoop.fs.azure.SimpleKeyProvider
后,它似乎适用于阅读和简单的&#34;将数据写入表,但在Metastore参与时再次失败(创建表,添加新分区,......):
ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4434)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:316)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
[...]
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38593)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38561)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:38487)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1103)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1089)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2203)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:99)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:736)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:724)
[...]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
at com.sun.proxy.$Proxy5.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:777)
... 24 more
我再次尝试谷歌搜索,但没有找到可用的东西。我认为它可能需要做一些事实,即Metastore服务与Hive分开运行,并且由于某种原因无法访问配置的存储帐户密钥......但说实话,我认为这应该只是工作无需手动修改Hadoop / Hive配置。
所以,我的问题是:我做错了什么,这应该如何运作?
答案 0 :(得分:0)
您需要确保将hadoop-azure.jar和azure-storage-5.4.0.jar添加到hadoop-env.sh中的Hadoop Classpath导出中。
export HADOOP_CLASSPATH = / usr / lib / hadoop-client / hadoop-azure.jar:/usr/lib/hadoop-client/lib/azure-storage-5.4.0.jar:$ HADOOP_CLASSPATH
您需要通过核心站点中的以下参数添加存储密钥。 fs.azure.account.key。{storageaccount} .blob.core.windows.net
创建数据库和表格时,您需要使用存储帐户和用户ID指定位置 创建表{Tablename} ... LOCATION&#39; wasbs:// {container} @ {storageaccount} .blob.core.windows.net / {filepath}&#39;
如果在尝试上述检查后仍有问题,请查看存储帐户是V1还是V2。我们遇到的问题是V2存储帐户无法与我们的HDP版本一起使用。