无法在URI ||中的分布式缓存中加载文件获取NULLPointerException

时间:2018-05-17 07:14:18

标签: java hadoop distributed-caching

我正在尝试编写一个正在进行情感分析的地图减少工作,我使用AFINN.txt作为字典。在运行地图减少作业时,我将其放入HDFS内的文件中并尝试运行,但每次失败时。我使用下面的代码来比较单词与AFINN

    public class Sentiment_Analysis extends Configured implements Tool {

public static class Map extends Mapper<LongWritable, Text, Text, Text> {

    private URI[] files;

    private HashMap<String, String> AFINN_map = new HashMap<String, String>();

    @Override
    public void setup(Context context) throws IOException

    {

        files = DistributedCache.getCacheFiles(context.getConfiguration());

        System.out.println("files:" + files);

        Path path = new Path(files[0]); // here i am getting the Exception

        FileSystem fs = FileSystem.get(context.getConfiguration());

        FSDataInputStream in = fs.open(path);

        BufferedReader br = new BufferedReader(new InputStreamReader(in));

        String line = "";

        while ((line = br.readLine()) != null)

        {

            String splits[] = line.split("\t");

            AFINN_map.put(splits[0], splits[1]);

        }

        br.close();

        in.close();

    }

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        String twt;
        String line = value.toString();
        String[] tuple = line.split("\\n");
        JSONParser jsonParser = new JSONParser();

        try {

            for (int i = 0; i < tuple.length; i++) {

                JSONObject obj = (JSONObject) jsonParser.parse(line);

                String tweet_id = (String) obj.get("id_str");

                String tweet_text = (String) obj.get("text");
                twt = (String) obj.get("text");
                String[] splits = twt.toString().split(" ");

                int sentiment_sum = 0;

                for (String word : splits) {

                    if (AFINN_map.containsKey(word))

                    {

                        Integer x = new Integer(AFINN_map.get(word));

                        sentiment_sum += x;

                    }

                }

                context.write(
                        new Text(tweet_id),
                        new Text(tweet_text + "\t----->\t"
                                + new Text(Integer.toString(sentiment_sum))));

            }

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

public static class Reduce extends Reducer<Text, Text, Text, Text> {

    public void reduce(Text key, Text value, Context context)
            throws IOException, InterruptedException {

        context.write(key, value);

    }

}

public static void main(String[] args) throws Exception

{

    ToolRunner.run(new Sentiment_Analysis(), args);

}

@Override
public int run(String[] args) throws Exception {

    Configuration conf = new Configuration();

    if (args.length != 2) {
        System.err.println("Usage: Parse <in> <out>");
        System.exit(2);
    }

    Job job = new Job(conf, "SentimentAnalysis");
    DistributedCache.addCacheFile(new URI("hdfs://localhost:50070//sentimentInput//AFINN.txt"), conf);
    job.setJarByClass(Sentiment_Analysis.class);

    job.setMapperClass(Map.class);

    job.setReducerClass(Reduce.class);

    job.setMapOutputKeyClass(Text.class);

    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(NullWritable.class);

    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);

    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

    return 0;

}

}

我的localhost网址是

 http://localhost:50070/

但是我已经使用下面的commnads

将文件放在hdfs中了
  bin/hdfs dfs -ls /sentimentInput
  18/05/17 12:25:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
  library for your platform... using builtin-java classes where applicable
  Found 2 items
  -rw-r--r--   1 jeet supergroup      28094 2018-05-17 11:43 
 /sentimentInput/AFINN.txt
 -rw-r--r--   1 jeet supergroup   13965969 2018-05-17 11:33 
/sentimentInput/FlumeData.1440939532959

表示文件存在,但是当我触发jobit时显示以下错误

bin/yarn jar ../sentiment.jar com.jeet.sentiment.Sentiment_Analysis /sentimentInput /sentimentOutput5


Exception in thread "main" java.lang.IllegalArgumentException: Pathname /localhost:50070/sentimentInput/AFINN.txt from hdfs:/localhost:50070/sentimentInput/AFINN.txt is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:195)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:104)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)

任何人都可以告诉我如何给出正确的文件路径,以便我可以测试我的代码吗?

1 个答案:

答案 0 :(得分:1)

您的URI缺少/:

HDFS://本地主机.....

编辑:

尝试使用已更新的缓存文件方法:

job.addCacheFile(uri);

content.getCachedFiles()