Question

所以基本上这是一个解析器/余弦矩阵计算器，但我不断收到编译错误。我认为我有正确阅读文本文件的输入路径。但它仍然无法编译。

这是我的主要课程：

    import java.io.FileNotFoundException;
    import java.io.IOException;

    public class TfIdfMain {

    public static void main(String args[]) throws FileNotFoundException, IOException {
        DocumentParser dp = new DocumentParser();
        dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); // give the location of source file
        dp.tfIdfCalculator(); //calculates tfidf
        dp.getCosineSimilarity(); //calculates cosine similarity   
    }
}

我的解析器类：

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class DocumentParser {

    //This variable will hold all terms of each document in an array.
    private List<String[]> termsDocsArray = new ArrayList<String[]>();
    private List<String> allTerms = new ArrayList<String>(); //to hold all terms
    private List<double[]> tfidfDocsVector = new ArrayList<double[]>();

    /**
     * Method to read files and store in array.
     */
    public void parseFiles(String filePath) throws FileNotFoundException, IOException {
        File[] allfiles = new File(filePath).listFiles();
        BufferedReader in = null;
        for (File f : allfiles) {
            if (f.getName().endsWith(".txt")) {
                in = new BufferedReader(new FileReader(f));
                StringBuilder sb = new StringBuilder();
                String s = null;
                while ((s = in.readLine()) != null) {
                    sb.append(s);
                }
                String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+");   //to get individual terms
                for (String term : tokenizedTerms) {
                    if (!allTerms.contains(term)) {  //avoid duplicate entry
                        allTerms.add(term);
                    }
                }
                termsDocsArray.add(tokenizedTerms);
            }
        }

    }

    /**
     * Method to create termVector according to its tfidf score.
     */
    public void tfIdfCalculator() {
        double tf; //term frequency
        double idf; //inverse document frequency
        double tfidf; //term requency inverse document frequency        
        for (String[] docTermsArray : termsDocsArray) {
            double[] tfidfvectors = new double[allTerms.size()];
            int count = 0;
            for (String terms : allTerms) {
                tf = new TfIdf().tfCalculator(docTermsArray, terms);
                idf = new TfIdf().idfCalculator(termsDocsArray, terms);
                tfidf = tf * idf;
                tfidfvectors[count] = tfidf;
                count++;
            }
            tfidfDocsVector.add(tfidfvectors);  //storing document vectors;            
        }
    }

    /**
     * Method to calculate cosine similarity between all the documents.
     */
    public void getCosineSimilarity() {
        for (int i = 0; i < tfidfDocsVector.size(); i++) {
            for (int j = 0; j < tfidfDocsVector.size(); j++) {
                System.out.println("between " + i + " and " + j + "  =  "
                                   + new CosineSimilarity().cosineSimilarity
                                       (
                                         tfidfDocsVector.get(i), 
                                         tfidfDocsVector.get(j)
                                       )
                                  );
            }
        }
    }
}

这是我的错误：

Exception in thread "main" java.lang.NullPointerException
    at DocumentParser.parseFiles(DocumentParser.java:22)
    at TfIdfMain.main(TfIdfMain.java:7)

我的文档中文本文件的路径是否错误？

Answer 1

Windows文件路径应使用\而不是/ 。此外还有另一个错误，代码不需要整个文件路径，只需要目录路径。而不是

dp.parseFiles("C:/Users/dachen/Documents/doc1.txt");

应该是

 dp.parseFiles("C:\\Users\\dachen\\Documents");

Answer 2

listFiles()的文档声明：

如果此抽象路径名不表示目录
，则返回null

您传递的路径不是目录。

在我的解析器中编译错误，认为我输入文件错误，但不确定错误是什么

2 个答案: