我想将大量文件从HDFS发送到Google Storage(GS)。所以我想在这种情况下使用distcp命令。
hadoop distcp -libjars <full path to connector jar> -m <amount of mappers> hdfs://<host>:<port(default 8020)>/<hdfs path> gs://<backet name>/
另外,我需要在core-site.xml中指定* .p12密钥文件才能访问GS。我需要将此文件分发到我的群集中的所有节点。
<property>
<name>google.cloud.auth.service.account.keyfile</name>
<value>/opt/hadoop/conf/gcskey.p12</value>
</property>
我不想手动完成。解析密钥文件的最佳做法是什么?
答案 0 :(得分:1)
有一个通用参数
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
命令将是
hadoop distcp -libjars <full path to connector jar> -files /etc/hadoop/conf/gcskey.p12 -m <amount of mappers> hdfs://<host>:<port(default 8020)>/<hdfs path> gs://<backet name>/
NOTE1 在这种情况下,我们需要在core-site.xml上设置密钥路径( google.cloud.auth.service.account.keyfile ),如以下示例
NOTE2 您需要在当前目录中包含.p12密钥文件,因为haddop会在启动时检查核心站点的路径。
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>
The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.
</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>google project id</value>
<description>Google Project Id</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
</property>
<property>
<name>google.cloud.auth.service.account.email</name>
<value>google service account email</value>
<description>Project service account email</description>
</property>
<property>
<name>google.cloud.auth.service.account.keyfile</name>
<value>gcskey.p12</value>
<description>Local path to .p12 key at each node</description>
</property>