Question

您好我正试图在亚马逊的EMR上运行Apache Nutch 1.2 为此，我指定了S3的输入目录。我收到以下错误：

Fetcher: java.lang.IllegalArgumentException:
    This file system object (hdfs://ip-11-202-55-144.ec2.internal:9000)
    does not support access to the request path 
    's3n://crawlResults2/segments/20110823155002/crawl_fetch'
    You possibly called FileSystem.get(conf) when you should have called
    FileSystem.get(uri, conf) to obtain a file system supporting your path.

我理解FileSystem.get(uri, conf)和FileSystem.get(conf)之间的区别。如果我自己写这个，我会FileSystem.get(uri, conf)但是我想尝试使用现有的Nutch代码。

我问过这个问题，有人告诉我，我需要修改hadoop-site.xml以包含以下属性：fs.default.name，fs.s3.awsAccessKeyId，fs.s3.awsSecretAccessKey。我在core-site.xml中更新了这些属性（hadoop-site.xml不存在），但这没有什么区别。有没有人有任何其他想法？谢谢你的帮助。

Answer 1

尝试在

中指定

的hadoop-site.xml中

<property>
  <name>fs.default.name</name>
  <value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>

这将提到Nutch默认S3应该使用

属性

fs.s3.awsAccessKeyId 和 fs.s3.awsSecretAccessKey

只有当您的S3对象置于身份验证状态时才需要

规范（在S3对象中可以访问所有用户，或者只能通过身份验证）

Nutch对从S3读取EMR问题的看法

1 个答案: