我正在尝试了解hadoop框架及其地图缩减功能。
环境使用: Windows 7与cygwin运行hadoop 0.19.1。 Eclipse Europa for Map减少工作开发。
问题面临:
使用默认映射器和reducer标识类的示例代码。目标是将输入文件夹中的文件复制到输出文件夹,而不对数据进行任何处理
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class TestDriver {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(TestDriver.class);
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path("src"));
//conf.setOutputPath(new Path("out"));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("In"));
FileOutputFormat.setOutputPath(conf, new Path("Out"));
// TODO: specify a mapper
conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
// TODO: specify a reducer
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
hadoop集群使用cygwin终端启动。转储jps
命令:
$ jps
4944 NameNode
6588 SecondaryNameNode
8504 TaskTracker
8640 JobTracker
8340 DataNode
8568 Jps
hadoop-site.xml文件包含以下内容:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
我的hadoop安装路径中没有yarn-site.xml文件通过cygwin, 它是必需的,这会导致任何问题吗?
在Eclipse中,使用Map / Reduce主端口9101和DFS主端口9100创建了一个hadoop map / reduce位置。 运行程序时,在控制台上遇到以下数据:
16/07/14 12:06:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/07/14 12:06:35 INFO mapred.FileInputFormat: Total input paths to process : 4
16/07/14 12:06:37 INFO mapred.JobClient: Running job: job_201607141149_0002
16/07/14 12:06:38 INFO mapred.JobClient: map 0% reduce 0%
我没有看到在cygwin的任务跟踪窗口中显示任何内容。 以下是来自职位跟踪窗口的转储:
16/07/14 12:05:24 ERROR mapred.EagerTaskInitializationListener: Job initialization failed:
java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2367)
at java.util.regex.Pattern.atom(Pattern.java:2164)
at java.util.regex.Pattern.sequence(Pattern.java:2097)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:803)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:360)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:55)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "initJobs" java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2367)
at java.util.regex.Pattern.atom(Pattern.java:2164)
at java.util.regex.Pattern.sequence(Pattern.java:2097)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1549)
at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2320)
at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2004)
at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2019)
at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2095)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
at java.lang.Thread.run(Thread.java:745)
有谁可以帮助理解这里的问题?