无法在windows cygwin上执行hadoop map reduce代码

时间:2016-07-14 08:26:29

标签: eclipse hadoop cygwin hdfs bigdata

我正在尝试了解hadoop框架及其地图缩减功能。

环境使用: Windows 7与cygwin运行hadoop 0.19.1。 Eclipse Europa for Map减少工作开发。

问题面临:

使用默认映射器和reducer标识类的示例代码。目标是将输入文件夹中的文件复制到输出文件夹,而不对数据进行任何处理

testDriver.java文件包含:

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;


public class TestDriver {

    public static void main(String[] args) {

        JobClient client = new JobClient();
        JobConf conf = new JobConf(TestDriver.class);

        // TODO: specify output types
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        // TODO: specify input and output DIRECTORIES (not files)
        //conf.setInputPath(new Path("src"));
        //conf.setOutputPath(new Path("out"));
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path("In"));
        FileOutputFormat.setOutputPath(conf, new Path("Out"));

        // TODO: specify a mapper
        conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);

        // TODO: specify a reducer
        conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

        client.setConf(conf);
        try {
            JobClient.runJob(conf);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

hadoop集群使用cygwin终端启动。转储jps命令:

$ jps
4944 NameNode
6588 SecondaryNameNode
8504 TaskTracker
8640 JobTracker
8340 DataNode
8568 Jps

hadoop-site.xml文件包含以下内容:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9100</value>

</property>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9101</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property> 
</configuration>

我的hadoop安装路径中没有yarn-site.xml文件通过cygwin, 它是必需的,这会导致任何问题吗?

在Eclipse中,使用Map / Reduce主端口9101和DFS主端口9100创建了一个hadoop map / reduce位置。 运行程序时,在控制台上遇到以下数据:

16/07/14 12:06:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/07/14 12:06:35 INFO mapred.FileInputFormat: Total input paths to process : 4
16/07/14 12:06:37 INFO mapred.JobClient: Running job: job_201607141149_0002
16/07/14 12:06:38 INFO mapred.JobClient:  map 0% reduce 0%

我没有看到在cygwin的任务跟踪窗口中显示任何内容。 以下是来自职位跟踪窗口的转储:

16/07/14 12:05:24 ERROR mapred.EagerTaskInitializationListener: Job initialization failed:
java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
                                                ^
        at java.util.regex.Pattern.error(Pattern.java:1924)
        at java.util.regex.Pattern.escape(Pattern.java:2367)
        at java.util.regex.Pattern.atom(Pattern.java:2164)
        at java.util.regex.Pattern.sequence(Pattern.java:2097)
        at java.util.regex.Pattern.expr(Pattern.java:1964)
        at java.util.regex.Pattern.compile(Pattern.java:1665)
        at java.util.regex.Pattern.<init>(Pattern.java:1337)
        at java.util.regex.Pattern.compile(Pattern.java:1022)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:803)
        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:360)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:55)
        at java.lang.Thread.run(Thread.java:745)

Exception in thread "initJobs" java.util.regex.PatternSyntaxException: \k is not followed by '<' for named capturing group near index 48
localhost_[0-9]+_job_201607141149_0001_hcltech\kaushik.srinivas_\Qtest_TestDriver.java-3370112959319510186.jar\E+
                                                ^
        at java.util.regex.Pattern.error(Pattern.java:1924)
        at java.util.regex.Pattern.escape(Pattern.java:2367)
        at java.util.regex.Pattern.atom(Pattern.java:2164)
        at java.util.regex.Pattern.sequence(Pattern.java:2097)
        at java.util.regex.Pattern.expr(Pattern.java:1964)
        at java.util.regex.Pattern.compile(Pattern.java:1665)
        at java.util.regex.Pattern.<init>(Pattern.java:1337)
        at java.util.regex.Pattern.compile(Pattern.java:1022)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:638)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
        at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1549)
        at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2320)
        at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2004)
        at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2019)
        at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2095)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
        at java.lang.Thread.run(Thread.java:745)

有谁可以帮助理解这里的问题?

0 个答案:

没有答案