我想在Mapper中访问分布式文件的内容。下面是我编写的代码,它生成分布式缓存的文件名。请帮我查看文件的内容
public class DistCacheExampleMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text >
{
Text a = new Text();
Path[] dates = new Path[0];
public void configure(JobConf conf) {
try {
dates = DistributedCache.getLocalCacheFiles(conf);
String astr = dates.toString();
a = new Text(astr);
} catch (IOException ioe) {
System.err.println("Caught exception while getting cached files: " +
StringUtils.stringifyException(ioe));
}
}
@Override
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String line = value.toString();
for(Path cacheFile: dates){
output.collect(new Text(line), new Text(cacheFile.getName()));
}
}
}
答案 0 :(得分:0)
请在configure()方法中尝试此操作:
List<String []> lines;
Path[] files = new Path[0];
public void configure(JobConf conf) {
lines = new ArrayList<>();
BufferedReader SW;
try {
files = DistributedCache.getLocalCacheFiles(conf);
SW = new BufferedReader(new FileReader(files[0].toString()));
String line;
while ((line = SW.readLine()) != null) {
lines.add(line.split(",")); //now, each lines entry is a String array, with each element being a column
}
SW.close();
} catch (IOException ioe) {
System.err.println("Caught exception while getting cached files: " +
StringUtils.stringifyException(ioe));
}
}
这样,您将在变量lines
中获得分布式缓存中文件的内容(在本例中为第一个文件)。每个lines
条目表示一个String数组,由','分隔。所以第一行的第一列是lines.get(0)[0]
,第二行的第三行是lines.get(1)[2]
等。