我正在尝试从java代码运行命令两个合并到文件! 命令是:
hadoop fs -cat /user/clouder/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile
该命令在Cloudera终端上运行完美,但是当我从java代码运行时,它会在控制台上显示合并的内容,但不会在HDFS上的指定路径中创建合并文件。如果mergedfile已经存在,那么它将输出文件的早期数据,但不输出新合并的数据,如果文件不存在,则不会创建新文件。如果上面的命令在终端上运行则创建新文件,如果不存在,则会出现文件错误。
我的java代码如下:
process p;
try{
p =Runtime.getRuntime().exec("hadoop fs -cat /user/cloudera/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile");
BufferredReader br=new BufferedReader(new InputStreamReader(p.getInputStream()));
while(s=br.readLine())!=null)
{
System.out.println(s);
}
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
如果存在现有文件,我的目的是替换,或者如果java代码中不存在则创建新文件。
答案 0 :(得分:1)
要使用Java运行HDFS命令,您应该使用HDFS Java API。以下是code sample from javased.com如何使用它来合并这些文件:
/**
* @param inputFiles a glob expression of the files to be merged
* @param outputFile a destination file path
* @param deleteSource delete source files after merging
* @return
* @throws IOException
*/
private static Path mergeTextFiles(String inputFiles,String outputFile,boolean deleteSource,boolean deleteDestinationFileIfExist) throws IOException {
JobConf conf=new JobConf(FileMerger.class);
FileSystem fs=FileSystem.get(conf);
Path inputPath=new Path(inputFiles);
Path outputPath=new Path(outputFile);
if (deleteDestinationFileIfExist) {
if (fs.exists(outputPath)) {
fs.delete(outputPath,false);
sLogger.info("Warning: remove destination file since it already exists...");
}
}
else {
Preconditions.checkArgument(!fs.exists(outputPath),new IOException("Destination file already exists..."));
}
FileUtil.copyMerge(fs,inputPath,fs,outputPath,deleteSource,conf,FILE_CONTENT_DELIMITER);
sLogger.info("Successfully merge " + inputPath.toString() + " to "+ outputFile);
return outputPath;
}
在这种情况下,您需要事先使用FileUtil class将要合并的文件复制到1个目录中。稍后您将获取此目录路径并将其作为inputFiles参数传递。:
JobConf conf=new JobConf(FileMerger.class);
FileSystem fs=FileSystem.get(conf);
String tmpDir = "/user/cloudera/tmp_dir";
Path[] paths = {new Path("/user/clouder/Index_1/part-r-00000"), new Path("/user/clouder/Index_2/part-r-00000")};
Path pathToInputs = FileUtil.copy(fs, paths, fs, new Path(tmpDir));
mergeTextFiles(tmpDir, "/user/cloudera/mergedfile", false, true);