我使用Hadoop 1.2.1并仅执行“地图”#39; job基本上将日志条目映射到mysql表中。其中一个提取字段是ip地址,有时它比表中的列长度太长而发生IOException。尽管map函数中有try-catch子句,但我无法捕获并处理它。代码如下:
public class LogEntriesMapper extends
Mapper<Object, Text, LogEntry, NullDBWritable> {
private static Pattern p1 = Pattern
.compile([…]);
private final static NullDBWritable nullDB = new NullDBWritable();
private LogEntry logEntry = new LogEntry();
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
super.setup(context);
p1 = Pattern.compile([...);
}
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String entry = value.toString();
Matcher matcher = p1.matcher(entry);
if (matcher.find()) {
String date = ...
String ip = ...
[extracting fields]
logEntry.setDate(date);
logEntry.setIp(ip);
logEntry.setClient(client);
logEntry.setSession(session);
logEntry.setReal_time(real_time);
try {
context.write(logEntry, nullDB);
} catch (IOException e) {
System.out.println("Failed to save entry: " + logEntry);
System.out.println(e.getMessage());
}
}
}
}
和syslog:
2014-01-07 15:38:08,908 WARN org.apache.hadoop.util.NativeCodeLoader:无法为您的平台加载native-hadoop库...使用适用的builtin-java类
2014-01-07 15:38:09,233 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:源名称ugi已经存在!
2014-01-07 15:38:09,330 INFO org.apache.hadoop.mapred.Task:使用ResourceCalculatorPlugin:null
2014-01-07 15:38:09,354 INFO org.apache.hadoop.mapred.MapTask:处理拆分:hdfs:// master:9000 / logs / 20130718.txt:0 + 49101
2014-01-07 15:38:09,672 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader:已加载的原生gpl库
2014-01-07 15:38:09,675 INFO com.hadoop.compression.lzo.LzoCodec:已成功加载&amp;初始化的本土 - lzo库[hadoop-lzo rev fbd3aa777e0ad06bce75c6aff8c91c7c68eb596b]
2014-01-07 15:38:09,788 WARN org.apache.hadoop.mapreduce.lib.db.DBOutputFormat:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException:连接关闭后不允许任何操作。 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 在com.mysql.jdbc.Util.handleNewInstance(Util.java:411) 在com.mysql.jdbc.Util.getInstance(Util.java:386) 在com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015) 在com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) 在com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975) 在com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920) 在com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1304) 在com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1296) 在com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5028) at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat $ DBRecordWriter.close(DBOutputFormat.java:98) at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:650) 在org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793) 在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) 在org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 在org.apache.hadoop.mapred.Child $ 4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) 在javax.security.auth.Subject.doAs(Subject.java:415) 在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 在org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-01-07 15:38:09,789 INFO org.apache.hadoop.mapred.MapTask:在关闭org.apache.hadoop.mapred.MapTask $NewDirectOutputCollector@6dac133e时忽略异常 java.io.IOException:语句关闭后不允许任何操作。 at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat $ DBRecordWriter.close(DBOutputFormat.java:103) at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:650) 在org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793) 在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) 在org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 在org.apache.hadoop.mapred.Child $ 4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) 在javax.security.auth.Subject.doAs(Subject.java:415) 在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 在org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-01-07 15:38:09,823 INFO org.apache.hadoop.mapred.TaskLogsTruncater:初始化日志&#39;截断器,mapRetainSize = -1,reduceRetainSize = -1
2014-01-07 15:38:09,855错误org.apache.hadoop.security.UserGroupInformation:PriviledgedActionException as:mateuszmurawski cause:java.io.IOException:数据截断:列数据太长了#39 ; IP&#39;在第1行
2014-01-07 15:38:09,855 WARN org.apache.hadoop.mapred.Child:运行子项时出错 java.io.IOException:数据截断:列的数据太长了&#39; ip&#39;在第1行 at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat $ DBRecordWriter.close(DBOutputFormat.java:103) at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:650) 在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) 在org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 在org.apache.hadoop.mapred.Child $ 4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) 在javax.security.auth.Subject.doAs(Subject.java:415) 在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 在org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-01-07 15:38:09,861 INFO org.apache.hadoop.mapred.Task:Runnning cleanup for the task