job.getFileCache从HDFS中在Hadoop中给出空文件

时间:2018-05-13 12:06:37

标签: java hadoop mapreduce hdfs pagerank

为什么hadoop在从HDFS读取时获取空txt文件。 我在hadoop ofcourse中使用itreative方法我必须将输出txt文件放入hadoop HDFS,然后下一次迭代从hadoop HDFS中恢复它。在这部分我的Map map作业中获取具有正确名称的txt文件,但它完全是空的。

    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    String[] str=value.toString().split("\\s+");
    int noToken=str.length-1;
    String token="";
    String curNode=str[0];
    float p =0;
    String[] keyRank = null;

        try{

            URI[] localpath= context.getCacheFiles();
            FileReader fr = new FileReader (localpath[0].toString());
            BufferedReader br = new BufferedReader (fr);

            String line = "inf";
            while(line!=null){
                line = br.readLine();

                if(line==null)
                    break;

                //System.out.println(line+" line");
                keyRank = line.toString().split("\\s+");

                try{
                    //System.out.println(keyRank[1].toString()+" key rank ");
                tsum=tsum+Float.parseFloat(keyRank[1].toString());
                tNode++;
            }catch (NumberFormatException e){
                   System.out.println(" rank MapOnly float exception");
               }

1 个答案:

答案 0 :(得分:0)

而不是使用此

    FileReader fr = new FileReader (localpath[0].toString());
    BufferedReader br = new BufferedReader (fr);

使用此代码

        FileSystem fs = FileSystem.get(context.getConfiguration());
        Path path = new Path(localpath[0].toString());
        InputStreamReader fr = new InputStreamReader (fs.open(path));
        BufferedReader br = new BufferedReader (fr);

你还需要ipmort hadoop文件系统和输入流阅读器,如下所示

       import org.apache.hadoop.fs.FileSystem;
       import java.io.InputStreamReader;