Question

我在Hadoop上运行MapReduce程序。

inputformat将每个文件路径传递给mapper。

我可以通过这样的cmd检查文件，

$ hadoop fs -ls hdfs：//slave1.kdars.com：8020 / user / hadoop / num_5 / 13.pdf

找到1项-rwxrwxrwx 3 hdfs hdfs 184269 2015-03-31 22:50 hdfs：//slave1.kdars.com：8020 / user / hadoop / num_5 / 13.pdf

但是，当我尝试从映射器端打开该文件时，它无法正常工作。

15/04/01 06:13:04 INFO mapreduce.Job：任务ID：attempt_1427882384950_0025_m_000002_2，状态：未通过错误：java.io.FileNotFoundException：hdfs：/slave1.kdars.com：8020 / user / hadoop / num_5 / 13.pdf（没有这样的文件或目录）

at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1111)

我检查了inputformat工作正常，mapper有正确的文件路径。映射器代码看起来像这样，

@Override
public void map(Text title, Text file, Context context) throws IOException, InterruptedException {

    long time = System.currentTimeMillis(); 
    SimpleDateFormat dayTime = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss");
    String str = dayTime.format(new Date(time));

    File temp = new File(file.toString());
    if(temp.exists()){
        DBManager.getInstance().insertSQL("insert into `plagiarismdb`.`workflow` (`type`) value ('"+temp+" is exists')");
    }else{
        DBManager.getInstance().insertSQL("insert into `plagiarismdb`.`workflow` (`type`) value ('"+temp+" is not exists')");
    }
}

请帮帮我。

Answer 1

首先，导入这些。

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

然后，在mapper方法中使用它们。

FileSystem fs = FileSystem.get(new Configuration());

Path path=  new Path(value.toString());
System.out.println(path);

if (fs.exists(path)) {
    context.write(value, one);
} else {
    context.write(value, zero);
}

Answer 2

package com.tcb;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

public class YellowTaxi {

    public static void main(String[] args) throws IOException {

        // Creation of the Hadoop HDFS JAVA API
        String hdfsPath = "hdfs://localhost:8020/user/YellowTaxi/yellowTaxi.csv";
        URI uri = URI.create(hdfsPath);
        // Path path = new Path(uri); // It constitute URI
        Configuration c = new Configuration();
        c.set("fs.default.name", hdfsPath);
        // Create a file

        FileSystem fs = FileSystem.get(new Configuration());

        Path path = new Path(uri.toString());
        // System.out.println(path);
        System.out.println(fs.exists(path));

        if (fs.exists(path)) {
            System.out.println("I am ok ");
            // Read the content of the csv file from hdfds
            BufferedReader br = null;
            String line = "";
            String cvsSplitBy = ",";
            JsonObject taxiTrips = new JsonObject();
            JsonArray array = new JsonArray();
            JsonObject item = new JsonObject();
            int i = 0;
            try {

                br = new BufferedReader(new FileReader(path.toString()));
                System.out.println("102");
                while ((line = br.readLine()) != null) {

                    // use comma as separator
                    String[] word = line.split(cvsSplitBy);
                    for (i = 0; i < word.length - 1; i++) {
                        item.addProperty(word[i], word[i + 1]);
                        array.add(item);

                        taxiTrips.add("Trips", array);

                    }
                }

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (br != null) {
                    try {
                        br.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }

            }

        }
    }
}

HDFS有文件，但发生了java.io.FileNotFoundException

2 个答案: