我需要在lucene中计算精度和召回值,并使用此源代码来执行此操作
public class PrecisionRecall {
public static void main(String[] args) throws Throwable {
File topicsFile = new File("C:/Users/Raden/Documents/lucene/LuceneHibernate/LIA/lia2e/src/lia/benchmark/topics.txt");
File qrelsFile = new File("C:/Users/Raden/Documents/lucene/LuceneHibernate/LIA/lia2e/src/lia/benchmark/qrels.txt");
Directory dir = FSDirectory.open(new File("C:/Users/Raden/Documents/myindex"));
Searcher searcher = new IndexSearcher(dir, true);
String docNameField = "filename";
PrintWriter logger = new PrintWriter(System.out, true);
TrecTopicsReader qReader = new TrecTopicsReader(); //#1
QualityQuery qqs[] = qReader.readQueries( //#1
new BufferedReader(new FileReader(topicsFile))); //#1
Judge judge = new TrecJudge(new BufferedReader( //#2
new FileReader(qrelsFile))); //#2
judge.validateData(qqs, logger); //#3
QualityQueryParser qqParser = new SimpleQQParser("title", "contents"); //#4
QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField);
SubmissionReport submitLog = null;
QualityStats stats[] = qrun.execute(judge, //#5
submitLog, logger);
QualityStats avg = QualityStats.average(stats); //#6
avg.log("SUMMARY",2,logger, " ");
dir.close();
}
}
这里是topicsfile
的内容 <top>
<num> Number: 0
<title> apache source
<desc> Description:
<narr> Narrative:
</top>
这是qrelsfile的内容
# Format:
#
# qnum 0 doc-name is-relevant
#
#
0 0 apache1.0.txt 1
0 0 apache1.1.txt 1
0 0 apache2.0.txt 1
现在当我运行显示精度值并且调用为零的源代码时会出现问题。这是我运行源代码时的结果。
0 - contents:apache contents:source
0 Stats:
Search Seconds: 0.047
DocName Seconds: 0.039
Num Points: 56.000
Num Good Points: 0.000
Max Good Points: 3.000
Average Precision: 0.000
MRR: 0.000
Recall: 0.000
Precision At 1: 0.000
Precision At 2: 0.000
Precision At 3: 0.000
Precision At 4: 0.000
Precision At 5: 0.000
Precision At 6: 0.000
Precision At 7: 0.000
Precision At 8: 0.000
Precision At 9: 0.000
Precision At 10: 0.000
Precision At 11: 0.000
Precision At 12: 0.000
Precision At 13: 0.000
Precision At 14: 0.000
Precision At 15: 0.000
Precision At 16: 0.000
Precision At 17: 0.000
Precision At 18: 0.000
Precision At 19: 0.000
Precision At 20: 0.000
SUMMARY
Search Seconds: 0.047
DocName Seconds: 0.039
Num Points: 56.000
Num Good Points: 0.000
Max Good Points: 3.000
Average Precision: 0.000
MRR: 0.000
Recall: 0.000
Precision At 1: 0.000
Precision At 2: 0.000
Precision At 3: 0.000
Precision At 4: 0.000
Precision At 5: 0.000
Precision At 6: 0.000
Precision At 7: 0.000
Precision At 8: 0.000
Precision At 9: 0.000
Precision At 10: 0.000
Precision At 11: 0.000
Precision At 12: 0.000
Precision At 13: 0.000
Precision At 14: 0.000
Precision At 15: 0.000
Precision At 16: 0.000
Precision At 17: 0.000
Precision At 18: 0.000
Precision At 19: 0.000
Precision At 20: 0.000
现在你能告诉我我做错了什么使精度和召回值变为零?当精度和召回值为零时,它意味着什么?我这样做的原因是因为我需要测量搜索引擎的性能,精确度和召回率是我实现它的方法之一。
谢谢
答案 0 :(得分:1)
精度= 0表示没有一个结果是正确的。例如,请参阅the wikipedia article。
我建议尝试单个查询,看看你的结果是什么。您的令牌化程序可能存在问题;也许你没有把事情包好等等。
答案 1 :(得分:1)
我认为问题在于索引程序。如果你看 好在
QualityBenchmark qrun =
new QualityBenchmark(qqs, qqParser, searcher, docNameField);
您会看到针对 查询 的匹配启动了搜索
和文档 名称 (= Lucene在Lucene索引中查找名称"filename"
字段中的值)。
这意味着当您编制索引时,您需要创建一个 显式文档字段 ,它将.txt文件的ID存储在您的语料库中(在您的情况下,他们的姓名),例如声明
public static final String FIELD_NAME = "filename";
然后再
document.add(new TextField(FIELD_NAME, "apache1.0.txt", Field.Store.YES));
和其他2个文件类似。否则它无法参考 命中配置文件中的名称。我有同样的问题,但在我添加新的自定义字段后,它就像一个魅力: - )
N.B。两个基准配置文件的格式基于TREC9格式;可以在http://trec.nist.gov/data/qrels_eng/找到示例qrels.txt
文件
以及http://trec.nist.gov/data/topics_eng/topics.501-550.txt处的示例topics.txt
文件。