我对Map / Reduce和Hadoop相当新。我正在尝试编写一个wordccount mapreduce程序,我正在使用eclipse在本地运行它。我已经指定了输入文件的路径以及输出目录。当我编译程序时,它抛出一个IO Exception“系统找不到指定的文件。”
我的代码看起来像这样
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Job;
public class WordCount {
public static class Wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word=new Text();
@Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
System.out.println("Line " + line);
StringTokenizer token = new StringTokenizer(line,",");
while(token.hasMoreTokens())
{
word.set(token.nextToken());
output.collect(word, one);
System.out.println("hii " + word + " " + one);
}
}
}
public static class wordcountreducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0 ;
System.out.println("Inside Reducer" + value.hasNext());
System.out.println("Key = " + key );
while(value.hasNext()) {
sum += value.next().get();
}
output.collect(key,new IntWritable(sum));
System.out.println(key + " " + sum);
}
}
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(WordCount.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Wordcountmapper.class);
conf.setReducerClass(wordcountreducer.class);
FileInputFormat.addInputPath(conf, new Path("C:\\WordCount\\wordcount.txt"));
String outputfile = "C:\\WordCount\\Output\\a.txt";
FileOutputFormat.setOutputPath(conf, new Path(outputfile));
JobClient.runJob(conf);
}
}
我在这里遗漏了什么吗?我正在使用带有hadoop插件的Eclipse Juno来运行map reduce程序。
抛出的错误如下
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/01/22 17:54:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/01/22 17:54:43 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/01/22 17:54:43 INFO Configuration.deprecation: mapred.job.queue.name is deprecated. Instead, use mapreduce.job.queuename
15/01/22 17:54:43 ERROR security.UserGroupInformation: PriviledgedActionException as:Soorya S (auth:SIMPLE) cause:java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified.
Exception in thread "main" java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified.
at java.lang.ProcessBuilder.start(ProcessBuilder.java:471)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:384)
at org.apache.hadoop.util.Shell.run(Shell.java:359)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:569)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:658)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:641)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:435)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:277)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:122)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:969)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:963)
at java.security.AccessController.doPrivileged(AccessController.java:284)
at javax.security.auth.Subject.doAs(Subject.java:573)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:963)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:937)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1375)
at com.ibm.hadoop.WordCount.main(WordCount.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified.
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:92)
at java.lang.ProcessImpl.start(ProcessImpl.java:41)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:464)
... 23 more
getting access token
[getToken] got user access token
getting primary group
[getPrimaryGroup] Got TokenPrimaryGroup info
[getPrimaryGroup] primaryGroup: S-1-5-21-184784153-2138975554-913327727-513
getting supplementary groups
[getGroups] Got TokenGroups info
[getGroups] group 0: S-1-5-21-184784153-2138975554-913327727-513
[getGroups] group 1: S-1-1-0
[getGroups] group 2: S-1-5-114
[getGroups] group 3: S-1-5-32-544
[getGroups] group 4: S-1-5-32-545
[getGroups] group 5: S-1-5-4
[getGroups] group 6: S-1-2-1
[getGroups] group 7: S-1-5-11
[getGroups] group 8: S-1-5-15
[getGroups] group 9: S-1-5-113
[getGroups] group 10: S-1-5-5-0-576278
[getGroups] group 11: S-1-2-0
[getGroups] group 12: S-1-5-64-10
[getGroups] group 13:S-1-16-8192
答案 0 :(得分:0)
我有同样的问题(我假设你正在使用Windows)。这是因为需要设置指向'chmod'可执行文件的系统路径变量。您可以按照此处running hadoop on window-7 64 bit
中所述的步骤进行操作答案 1 :(得分:0)
由于Hadoop对Unix工具chmod进行系统调用,我遇到了以下异常:
Exception in thread "main" java.io.IOException: Cannot run program "chmod": CreateProcess error=2,
The system cannot find the file specified :
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:201)
at org.apache.hadoop.util.Shell.run(Shell.java:183)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:376)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:462)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:445)
at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:543)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:535)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:336)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:400)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:610)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:591)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:498)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:490)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.<init>(HFile.java:306)
at org.expasy.jpl.io.util.JPLHMapSerializer.init(JPLHMapSerializer.java:125)
您可以尝试使用波纹管
修复依赖错误 解决方案是在Windows系统中安装cygwin或安装cygwin的子集,因为仅需要chmod及其dll。下面,我们将提供第二种选择的解决方案:
第一步:获取“ chmod”资源 以下是不同Windows体系结构的档案:
Windows 32位-包含chmod.exe,cygwin1.dll,cygiconv-2.dll,cygintl-8.dll和cyggcc_s-1.dll Windows 64位-尚不可用 第二步:在Windows中设置路径 不要忘记在Windows中为chmod设置PATH变量,否则将找不到chmod!
首先右键单击桌面上的“我的电脑”图标,然后单击“属性”。或者,您可以只按Windows键+暂停中断键 然后在打开的新窗口中,单击“高级”选项卡 单击环境变量 在系统变量中,编辑或创建PATH变量,然后输入cygwin-chmod目录的路径名
答案 2 :(得分:0)
安装cygwin并将其添加到您的路径环境
答案 3 :(得分:0)
要解决错误:线程“main”中的异常java.io.IOException:无法使用java hadoop程序运行程序“chmod”,请尝试以管理员身份运行eclipse。对我来说已经奏效了。