我在Hadoop上运行MapReduce程序。
inputformat将每个文件路径传递给mapper。
我可以通过这样的cmd检查文件,
$ hadoop fs -ls hdfs://slave1.kdars.com:8020 / user / hadoop / num_5 / 13.pdf
找到1项-rwxrwxrwx 3 hdfs hdfs 184269 2015-03-31 22:50 hdfs://slave1.kdars.com:8020 / user / hadoop / num_5 / 13.pdf
但是,当我尝试从映射器端打开该文件时,它无法正常工作。
15/04/01 06:13:04 INFO mapreduce.Job:任务ID:attempt_1427882384950_0025_m_000002_2,状态:未通过 错误:java.io.FileNotFoundException:hdfs:/slave1.kdars.com:8020 / user / hadoop / num_5 / 13.pdf(没有这样的文件或目录)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1111)
我检查了inputformat工作正常,mapper有正确的文件路径。 映射器代码看起来像这样,
@Override
public void map(Text title, Text file, Context context) throws IOException, InterruptedException {
long time = System.currentTimeMillis();
SimpleDateFormat dayTime = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss");
String str = dayTime.format(new Date(time));
File temp = new File(file.toString());
if(temp.exists()){
DBManager.getInstance().insertSQL("insert into `plagiarismdb`.`workflow` (`type`) value ('"+temp+" is exists')");
}else{
DBManager.getInstance().insertSQL("insert into `plagiarismdb`.`workflow` (`type`) value ('"+temp+" is not exists')");
}
}
请帮帮我。
答案 0 :(得分:1)
首先,导入这些。
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
然后,在mapper方法中使用它们。
FileSystem fs = FileSystem.get(new Configuration());
Path path= new Path(value.toString());
System.out.println(path);
if (fs.exists(path)) {
context.write(value, one);
} else {
context.write(value, zero);
}
答案 1 :(得分:0)
package com.tcb;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
public class YellowTaxi {
public static void main(String[] args) throws IOException {
// Creation of the Hadoop HDFS JAVA API
String hdfsPath = "hdfs://localhost:8020/user/YellowTaxi/yellowTaxi.csv";
URI uri = URI.create(hdfsPath);
// Path path = new Path(uri); // It constitute URI
Configuration c = new Configuration();
c.set("fs.default.name", hdfsPath);
// Create a file
FileSystem fs = FileSystem.get(new Configuration());
Path path = new Path(uri.toString());
// System.out.println(path);
System.out.println(fs.exists(path));
if (fs.exists(path)) {
System.out.println("I am ok ");
// Read the content of the csv file from hdfds
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
JsonObject taxiTrips = new JsonObject();
JsonArray array = new JsonArray();
JsonObject item = new JsonObject();
int i = 0;
try {
br = new BufferedReader(new FileReader(path.toString()));
System.out.println("102");
while ((line = br.readLine()) != null) {
// use comma as separator
String[] word = line.split(cvsSplitBy);
for (i = 0; i < word.length - 1; i++) {
item.addProperty(word[i], word[i + 1]);
array.add(item);
taxiTrips.add("Trips", array);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
}