Nutch on Hadoop |输入路径不存在:

时间:2015-08-16 21:41:53

标签: hadoop mapreduce nutch hadoop2

我收到错误运行命令时输入路径不存在

let documentDirectory = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true)[0] as! String
let imageFilePath = documentDirectory.stringByAppendingPathComponent("lastimage")
UIImagePNGRepresentation(myImage).writeToFile(imageFilePath, atomically: true)
let asset = CKAsset(fileURL: NSURL(fileURLWithPath: imageFilePath))
mySaveRecord.setObject(asset, forKey: "ProfilePicture")
CKContainer.defaultContainer().publicCloudDatabase.saveRecord(mySaveRecord, completionHandler: {
    record, error in
    if error != nil {
        println("\(error)")
    } else {
        //record saved successfully!
    }
})

在nutch / logs中,我在hadoop.log中遇到了这个错误

nutch inject crawldb urls

有些如何在本地文件系统中进行搜索。

这是hadoop的核心站点文件

的内容
2015-08-16 16:08:12,834 INFO  crawl.Injector - Injector: starting at 2015-08-16 16:08:12
2015-08-16 16:08:12,834 INFO  crawl.Injector - Injector: crawlDb: crawldb
2015-08-16 16:08:12,835 INFO  crawl.Injector - Injector: urlDir: urls
2015-08-16 16:08:12,835 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries.
2015-08-16 16:08:13,296 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-08-16 16:08:13,417 WARN  snappy.LoadSnappy - Snappy native library not loaded
2015-08-16 16:08:13,430 ERROR security.UserGroupInformation - PriviledgedActionException as:hdravi cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hdravi/urls
2015-08-16 16:08:13,432 ERROR crawl.Injector - Injector: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/hdravi/urls
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
    at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:323)
    at org.apache.nutch.crawl.Injector.run(Injector.java:379)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:369)

这是内容hadoop的hdfs-site.xml

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

当我输入<configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration> 时,这是我得到的输出

hadoop fs -ls -R /

我在hadoop / nutch中缺少任何配置吗?

更新

使用完整的HDFS路径时出现以下错误

drwxrwxrwx   - hdravi supergroup          0 2015-08-16 16:06 /user
drwxrwxrwx   - hdravi supergroup          0 2015-08-16 16:06 /user/hdravi
drwxr-xr-x   - hdravi supergroup          0 2015-08-16 16:06 /user/hdravi/urls
-rw-r--r--   1 hdravi supergroup        240 2015-08-16 16:06 /user/hdravi/urls/seed.txt

1 个答案:

答案 0 :(得分:0)

我不确定nutch,但是关于Hadoop,请在启动MapReduce作业之前尝试使用配置对象加载配置文件。

此解决方案适用于我:

Configuration conf = new Configuration();        
conf.addResource(new Path("path to hadoop/conf/core-site.xml"));
conf.addResource(new Path("path to hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);

您也可以尝试使用输入目录的完整路径

hdfs://localhost:54310/user/hdravi