带有Iceberg和S3的独立配置单元metastore

时间:2020-10-05 18:48:39

标签: hadoop amazon-s3 hive metastore iceberg

我想使用Presto查询作为拼花文件存储在S3中的Iceberg表,因此我需要使用Hive Metastore。我正在运行由MySql支持的独立的配置单元metastore服务。我已经将Iceberg配置为使用Hive目录:

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.hive.HiveCatalog;

public class MetastoreTest {

    public static void main(String[] args) {
        Configuration conf = new Configuration();
        conf.set("hive.metastore.uris", "thrift://x.x.x.x:9083");
        conf.set("hive.metastore.warehouse.dir", "s3://bucket/warehouse");
        HiveCatalog catalog = new HiveCatalog(conf);
        catalog.createNamespace(Namespace.of("my_metastore"));
    }

}

我遇到以下错误:Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme "s3")

我已经将/hadoop-3.3.0/share/hadoop/tools/lib包含在HADOOP_CLASSPATH中,并且还将与AWS相关的jar复制到了apache-hive-metastore-3.0.0-bin/lib中。还缺少什么?

1 个答案:

答案 0 :(得分:0)

最后弄清楚了。首先(如前所述),我必须在hadoop/share/hadoop/tools/lib中加入HADOOP_CLASSPATH。但是,修改HADOOP_CLASSPATH或将特定文件从工具复制到common都没有用。然后我切换到hadoop-2.7.7,它起作用了。另外,我不得不将杰克逊相关的罐子从工具复制到普通罐子。我的hadoop/etc/hadoop/core-site.xml看起来像这样:

<configuration>

    <property>
        <name>fs.default.name</name>
        <value>s3a://{bucket_name}</value>
    </property>


    <property>
        <name>fs.s3a.impl</name>
        <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>

    <property>
        <name>fs.s3a.endpoint</name>
        <value>{s3_endpoint}</value>
        <description>AWS S3 endpoint to connect to. An up-to-date list is
            provided in the AWS Documentation: regions and endpoints. Without this
            property, the standard region (s3.amazonaws.com) is assumed.
        </description>
    </property>


    <property>
        <name>fs.s3a.access.key</name>
        <value>{access_key}</value>
    </property>

    <property>
        <name>fs.s3a.secret.key</name>
        <value>{secret_key}</value>
    </property>


</configuration>

在这一点上,您应该可以使用s3存储桶:

hadoop fs -ls s3a://{bucket}/