我正在尝试使用listOfWords文件来仅计算来自任何输入文件的那些单词。尽管我已经验证文件位于HDFS中的正确位置,但是将错误视为FileNotFound。
内部驱动程序:
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/training/listOfWords"), conf);
Job job = new Job(conf,"CountEachWord Job");
Inside Mapper:
private Path[] ref_file;
ArrayList<String> globalList = new ArrayList<String>();
public void setup(Context context) throws IOException{
this.ref_file = DistributedCache.getLocalCacheFiles(context.getConfiguration());
FileSystem fs = FileSystem.get(context.getConfiguration());
FSDataInputStream in_file = fs.open(ref_file[0]);
System.out.println("File opened");
BufferedReader br = new BufferedReader(new InputStreamReader(in_file));//each line of reference file
System.out.println("BufferReader invoked");
String eachLine = null;
while((eachLine = br.readLine()) != null)
{
System.out.println("eachLine is: "+ eachLine);
globalList.add(eachLine);
}
}
错误讯息:
hadoop jar CountOnlyMatchWords.jar CountEachWordDriver Rhymes CountMatchWordsOut1
Warning: $HADOOP_HOME is deprecated.
14/10/07 22:28:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/10/07 22:28:59 INFO input.FileInputFormat: Total input paths to process : 1
14/10/07 22:28:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/10/07 22:28:59 WARN snappy.LoadSnappy: Snappy native library not loaded
14/10/07 22:29:00 INFO mapred.JobClient: Running job: job_201409300531_0041
14/10/07 22:29:01 INFO mapred.JobClient: map 0% reduce 0%
14/10/07 22:29:14 INFO mapred.JobClient: Task Id : attempt_201409300531_0041_m_000000_0, Status : FAILED
java.io.FileNotFoundException: File does not exist: /home/training/hadoop-temp/mapred/local /taskTracker/distcache/5910352135771601888_2043607380_1633197895/localhost/user/training/listOfWords
我已经在HDFS中验证了所提到的文件存在。我也尝试过使用localRunner。仍然没有奏效。
答案 0 :(得分:1)
在main方法中,我使用它。
def find(query: JsObject = Json.obj())(implicit reader: Reads[T]): Future[List[T]] = {
collection.flatMap(_.find(query).cursor[T](ReadPreference.nearest).collect[List](Int.MaxValue, Cursor.FailOnError()))
}
然后在Mapper中我使用了这个样板。
Job job = Job.getInstance();
job.setJarByClass(DistributedCacheExample.class);
job.setJobName("Distributed cache example");
job.addCacheFile(new Path("/user/cloudera/datasets/abc.dat").toUri());
我正在使用这些依赖项
protected void setup(Context context) throws IOException, InterruptedException {
URI[] files = context.getCacheFiles();
for(URI file : files){
if(file.getPath().contains("abc.dat")){
Path path = new Path(file);
BufferedReader reader = new BufferedReader(new FileReader(path.getName()));
String line = reader.readLine();
while(line != null){
......
}
}
}
对我来说,欺骗部分是在 <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.3</version>
</dependency>
使用path.getName
,如果不是FileReader
答案 1 :(得分:0)
你可以尝试这个来检索文件。
URI [] files = DistributedCache.getCacheFiles(context.getConfiguration());
您可以遍历文件。
答案 2 :(得分:0)
尝试this
在驱动程序中
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
在Mapper setup()
中public void setup(Context context) throws IOException{
Configuration conf = context.getConfiguration();
FileSystem fs = FileSystem.get(conf);
URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
Path getPath = new Path(cacheFiles[0].getPath());
BufferedReader bf = new BufferedReader(new InputStreamReader(fs.open(getPath)));
String setupData = null;
while ((setupData = bf.readLine()) != null) {
System.out.println("Setup Line in reducer "+setupData);
}
}
答案 3 :(得分:0)
try {
URI[] cacheFiles = DistributedCache.getCacheFiles(job); // Fetch the centroid file from distributed cache
Path getPath = new Path(cacheFiles[0].getPath());
FileSystem fs = FileSystem.get(job);
if (cacheFiles != null && cacheFiles.length > 0) {
// Goes in if the file exist and is not empty
String line;
centers.clear(); // clearing the centers array list each time
BufferedReader cacheBufferReader = new BufferedReader(new InputStreamReader(fs.open(getPath)));
try {
while ((line = cacheBufferReader.readLine()) != null) {
centers.add(line);
}
} catch (IOException e) {
System.err.println("Exception: " + e);
}
}
} catch (IOException e) {
System.err.println("Exception: " + e);
}