我正在使用历史使用XmlInputFormat解析XML来对抗维基百科转储的MapReduce任务。 由于超时,“xxx_m_000053_0”在被杀之前总是停留在70%。
控制台中:
xxx_m_000053_0无法报告状态300秒。杀死!
我将超时时间增加到2小时。它不起作用。
在xxx_m_000053_0日志文件中:
处理拆分:hdfs:// localhost:8020 / user / martin / history / history.xml:3556769792 + 67108864
我在offset [3556769792,3623878656]的history.xml中期待错误。 I split the file from this offset and run it in hadoop。它起作用了...(???)
在xxx_m_000053_0日志文件中
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2326)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
**at com.doduck.wikilink.history.XmlInputFormat$XmlRecordReader.close(XmlInputFormat.java:109)**
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector@54e9a7c2
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
所以我认为这可能是一个配置问题?为什么我的文件系统停止了?
XmlInputFormat出了什么问题?
我的空映射器:
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//nothing to do...
}
我的主要:
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("xmlinput.start", "<page>");
conf.set("xmlinput.end", "</page>");
Job job = new Job(conf, "wikipedia link history");
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(XmlInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
HDFS-site.xml中:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml中:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx9216m</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>300000</value>
</property>
我的core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/WD/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>