Question

我正在为java中的单词列表创建倒排索引。基本上它为每个单词创建一个列表，其中包含与该文档中单词的频率相关联的单词出现的文档索引，所需的输出应该是这样的：

[word1:[FileNo:frequency],[FileNo:frequency],[FileNo:frequency],word2:[FileNo:frequency],[FileNo:frequency]...etc]

以下是代码：

package assigenment2;
import java.io.*;

import java.util.*;

public class invertedIndex {
    public static Map<String, Map<Integer,Integer>> wordTodocumentMap;
    public static BufferedReader buffer;
    public static BufferedReader br;
    public static BufferedReader reader;
    public static List<String> files = new ArrayList<String>();
    public static List<String>[] tokens; 

public static void main(String[] args) throws IOException {
    //read the token file and store the token in list
    String tokensPath="/Users/Manal/Documents/workspace/Information Retrieval/tokens.txt";
    int k=0;
    String[] tokens = new String[8500];
    String sCurrentLine;

    try
    {
        FileReader fr=new FileReader(tokensPath);
        BufferedReader br= new BufferedReader(fr);

        while ((sCurrentLine = br.readLine()) != null)
        {
            tokens[k]=sCurrentLine;
            k++;
        }

        System.out.println("the number of token are:"+k+" words");
        br.close();

    }
    catch(Exception ex)
    {System.out.println(ex);}

直到它正常工作，我相信问题在于操纵嵌套地图在以下部分：

    TreeMap<Integer,Integer> documentToCount = new TreeMap<Integer,Integer>();

    //read files    
    System.out.print("Enter the path of files you want to process:\n");
    Scanner InputPath = new Scanner(System.in);
    String cranfield = InputPath.nextLine();
    File cranfieldFiles = new File(cranfield);  

        for (File file: cranfieldFiles.listFiles())
        {
            int fileno = files.indexOf(file.getPath());

            if (fileno == -1) //the current file isn't in the files list \
                {
                files.add(file.getPath());// add file to the files list
                fileno = files.size() - 1;//the index of file will start from 0 to size-1
                 }
             int frequency = 0;
             BufferedReader reader = new BufferedReader(new FileReader(file));
            for (String line = reader.readLine(); line != null; line = reader.readLine()) 
            {
                for (String _word : line.split(" ")) 
                {
                    String word = _word.toLowerCase();
                    if (Arrays.asList(tokens).contains(word))
                            if (wordTodocumentMap.get(word) == null)//check whether word is new word
                                    {
                                    documentToCount = new TreeMap<Integer,Integer>();
                                    wordTodocumentMap.put(word, documentToCount);
                                    }
                                documentToCount.put(fileno, frequency+1);//add the location and frequency
                                }   
                }
        }
        reader.close();
    }
}

我得到的错误是：

线程中的异常＆＃34; main＆＃34;显示java.lang.NullPointerException
在assigenment2.invertedIndex.main（invertedIndex.java:65）

Answer 1

您永远不会实例化wordTodocumentMap，因此始终保持null。因此，if (wordTodocumentMap.get(word) == null)//check whether word is new word行会在您NullPointerException时抛出.get()，也就是说，在您要与null进行比较之前。一种可能的解决方案是在声明中实例化地图：

public static Map<String, Map<Integer,Integer>> wordTodocumentMap = new HashMap<>();

您的代码中可能存在其他问题，但这可以让您更进一步。

无法添加到另一个地图内的树地图（以创建倒排索引）

1 个答案: