从Java代码运行Linux Hadoop fs命令

时间:2015-11-16 18:34:26

标签: java linux hadoop concatenation hdfs

我正在尝试从java代码运行命令两个合并到文件! 命令是:

hadoop fs -cat /user/clouder/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile

该命令在Cloudera终端上运行完美,但是当我从java代码运行时,它会在控制台上显示合并的内容,但不会在HDFS上的指定路径中创建合并文件。如果mergedfile已经存在,那么它将输出文件的早期数据,但不输出新合并的数据,如果文件不存在,则不会创建新文件。如果上面的命令在终端上运行则创建新文件,如果不存在,则会出现文件错误。

我的java代码如下:

process p;

try{

        p =Runtime.getRuntime().exec("hadoop fs -cat /user/cloudera/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile");
        BufferredReader br=new BufferedReader(new InputStreamReader(p.getInputStream()));

        while(s=br.readLine())!=null)
        {
            System.out.println(s);
        }
    }

catch(Exception e)
    {
        System.out.println(e.getMessage());
    }

如果存在现有文件,我的目的是替换,或者如果java代码中不存在则创建新文件。

1 个答案:

答案 0 :(得分:1)

要使用Java运行HDFS命令,您应该使用HDFS Java API。以下是code sample from javased.com如何使用它来合并这些文件:

/** 
 * @param inputFiles a glob expression of the files to be merged
 * @param outputFile a destination file path
 * @param deleteSource delete source files after merging
 * @return
 * @throws IOException
 */
private static Path mergeTextFiles(String inputFiles,String outputFile,boolean deleteSource,boolean deleteDestinationFileIfExist) throws IOException {
  JobConf conf=new JobConf(FileMerger.class);
  FileSystem fs=FileSystem.get(conf);
  Path inputPath=new Path(inputFiles);
  Path outputPath=new Path(outputFile);
  if (deleteDestinationFileIfExist) {
    if (fs.exists(outputPath)) {
      fs.delete(outputPath,false);
      sLogger.info("Warning: remove destination file since it already exists...");
    }
  }
 else {
    Preconditions.checkArgument(!fs.exists(outputPath),new IOException("Destination file already exists..."));
  }
  FileUtil.copyMerge(fs,inputPath,fs,outputPath,deleteSource,conf,FILE_CONTENT_DELIMITER);
  sLogger.info("Successfully merge " + inputPath.toString() + " to "+ outputFile);
  return outputPath;
}

在这种情况下,您需要事先使用FileUtil class将要合并的文件复制到1个目录中。稍后您将获取此目录路径并将其作为inputFiles参数传递。:

JobConf conf=new JobConf(FileMerger.class);
FileSystem fs=FileSystem.get(conf);
String tmpDir = "/user/cloudera/tmp_dir";
Path[] paths = {new Path("/user/clouder/Index_1/part-r-00000"), new Path("/user/clouder/Index_2/part-r-00000")};
Path pathToInputs = FileUtil.copy(fs, paths, fs, new Path(tmpDir));
mergeTextFiles(tmpDir, "/user/cloudera/mergedfile", false, true);