如何在Pred hadoop群集上使用s3文件?

时间:2019-11-27 13:33:26

标签: hadoop amazon-s3 cloudera

我有一个cloudera VM并能够设置AWS CLI和设置密钥。但是,我无法使用 hadoop fs -ls s3:// gft-ri或s3文件读取或访问s3文件。任何hadoop命令。我可以使用aws CLI查看目录/文件。

命令快照:

(base) [cloudera@quickstart conf]$ **aws s3 ls s3://gft-risk-aml-market-dev/**
                           PRE test/
2019-11-27 04:11:26        458 required

(base) [cloudera@quickstart conf]$ **hdfs dfs -ls s3://gft-risk-aml-market-dev/**
19/11/27 05:30:45 WARN fs.FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
ls: `s3://gft-risk-aml-market-dev/': No such file or directory

我已经输入了core-site.xml详细信息。

  <property>
    <name>fs.s3.impl</name>
    <value>org.apache.hadoop.fs.s3.S3FileSystem</value>
  </property>

  <property>
    <name>fs.s3.awsAccessKeyId</name>
    <value>ANHS</value>
  </property>

  <property>
    <name>fs.s3.awsSecretAccessKey</name>
    <value>EOo</value>
  </property>

   <property>
     <name>fs.s3.path.style.access</name>
     <value>true</value>
    </property>

   <property>
    <name>fs.s3.endpoint</name>
    <value>s3.us-east-1.amazonaws.com</value>
  </property>

     <property>
        <name>fs.s3.connection.ssl.enabled</name>
        <value>false</value>
    </property>

2 个答案:

答案 0 :(得分:1)

最后。 Cloudera Quickstart V13及更低版本的core-site.xml正常工作。

  <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>AKIAxxxx</value>
  </property>

  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>Xxxxxx</value>
  </property>

   <property>
     <name>fs.s3a.path.style.access</name>
     <value>true</value>
    </property>

<property>
  <name>fs.AbstractFileSystem.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3A</value>
  <description>The implementation class of the S3A AbstractFileSystem.</description>
</property>

   <property>
    <name>fs.s3a.endpoint</name>
    <value>s3.us-east-1.amazonaws.com</value>
  </property>

     <property>
        <name>fs.s3a.connection.ssl.enabled</name>
        <value>false</value>
    </property>

<property>
  <name>fs.s3a.readahead.range</name>
  <value>64K</value>
  <description>Bytes to read ahead during a seek() before closing and
  re-opening the S3 HTTP connection. This option will be overridden if
  any call to setReadahead() is made to an open stream.</description>
</property>

<property>
  <name>fs.s3a.list.version</name>
  <value>2</value>
  <description>Select which version of the S3 SDK's List Objects API to use.
  Currently support 2 (default) and 1 (older API).</description>
</property>

答案 1 :(得分:-1)

我将使用Linux控制台挂载S3存储桶,然后以这种方式将文件从那里移动到HDFS。您可能需要先在root用户上进行sudo'安装,然后才能在Cloudera快速入门上安装它,例如sudo yum install s3fs-fuse