使用mapreduce将mapfile复制到另一个

时间:2014-04-18 08:10:01

标签: java hadoop mapreduce

我正在编写一个程序,将地图文件复制到HDFS中的其他位置。

这是我的代码, 主要课程。

String uri = "hdfs://<ip>:8020/poc/input2";
String uri2 = "hdfs://<ip>:8020/poc/output1";
String uribyMapR = "hdfs://<ip>:8020/poc/outputbysetOutput";
boolean b = false;
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/core-site.xml"));
conf.addResource(new Path("/hadoop/hdfs-site.xml"));
FileSystem filesystem = FileSystem.get(conf);
Path inputpath = new Path(uri);
Path outputpath = new Path(uri2);
Path outputbyMapR = new Path(uribyMapR);
conf.set("uri", uri);
conf.set("uri2", uri2);
if (filesystem.exists(outputpath))          
    filesystem.delete(outputpath, true);
if (filesystem.exists(outputbyMapR))            
    filesystem.delete(outputbyMapR, true);  
Job job = new Job(conf, "MapFile");
job.setJarByClass(Main.class);
job.setMapperClass(MapTry.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
SequenceFileInputFormat.addInputPath(job, inputpath);
MapFileOutputFormat.setOutputPath(job, outputbyMapR);
try {
    b = job.waitForCompletion(true);
} catch (IOException e) {
    e.printStackTrace();
} catch (InterruptedException e) {
    e.printStackTrace();
} catch (ClassNotFoundException e) {
    e.printStackTrace();
}
if (!b) {
    throw new IOException("The job is failed");
}

映射器类:

public class MapTry extends Mapper<LongWritable, Text, LongWritable, Text>{
MapFile.Reader reader = null;
MapFile.Writer writer= null;
@Override
protected void setup(Context context)throws IOException, InterruptedException {
    Configuration conf = context.getConfiguration();
    String uri = context.getConfiguration().get("uri");
    String uri2 = context.getConfiguration().get("uri2");
    FileSystem fs = FileSystem.get(conf);
    reader = new Reader(fs, uri, conf);
    LongWritable key = new LongWritable(1);
    Text value = new Text();
    writer = new MapFile.Writer(conf, fs, uri2, key.getClass(), value.getClass());
}

@Override
protected void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    //reader.next(key, value);
    //reader.get(key, value);
    System.out.println(key.toString() + " " + value.toString());
    //writer.append(key, value);
    context.write(key, value);
}

@Override
protected void cleanup(Context context) throws IOException, InterruptedException{
    reader.close();
    writer.close();
}

MapFile包含2个文件,数据和索引。 issuie是在写入数据文件时它成功但当它到达索引文件时它会给出以下异常:

java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
    at mapreduce.MapTry.map(MapTry.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

那么,我该怎么做才能使用map reduce程序编写MapFile

1 个答案:

答案 0 :(得分:0)

发现如果我只是在数据文件中读取输入作为序列文件 并将其作为MapFile输出。

SequenceFileInputFormat.addInputPath(job, inputpath);
MapFileOutputFormat.setOutputPath(job, outputpath);