Question

为什么hadoop在从HDFS读取时获取空txt文件。我在hadoop ofcourse中使用itreative方法我必须将输出txt文件放入hadoop HDFS，然后下一次迭代从hadoop HDFS中恢复它。在这部分我的Map map作业中获取具有正确名称的txt文件，但它完全是空的。

    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    String[] str=value.toString().split("\\s+");
    int noToken=str.length-1;
    String token="";
    String curNode=str[0];
    float p =0;
    String[] keyRank = null;

        try{

            URI[] localpath= context.getCacheFiles();
            FileReader fr = new FileReader (localpath[0].toString());
            BufferedReader br = new BufferedReader (fr);

            String line = "inf";
            while(line!=null){
                line = br.readLine();

                if(line==null)
                    break;

                //System.out.println(line+" line");
                keyRank = line.toString().split("\\s+");

                try{
                    //System.out.println(keyRank[1].toString()+" key rank ");
                tsum=tsum+Float.parseFloat(keyRank[1].toString());
                tNode++;
            }catch (NumberFormatException e){
                   System.out.println(" rank MapOnly float exception");
               }

Answer 1

而不是使用此

    FileReader fr = new FileReader (localpath[0].toString());
    BufferedReader br = new BufferedReader (fr);

使用此代码

        FileSystem fs = FileSystem.get(context.getConfiguration());
        Path path = new Path(localpath[0].toString());
        InputStreamReader fr = new InputStreamReader (fs.open(path));
        BufferedReader br = new BufferedReader (fr);

你还需要ipmort hadoop文件系统和输入流阅读器，如下所示

       import org.apache.hadoop.fs.FileSystem;
       import java.io.InputStreamReader;

job.getFileCache从HDFS中在Hadoop中给出空文件

1 个答案: