我正在使用以下代码在Terrier Information Retrieval平台的网站Terrier IR platform homepage上关注Quickstart Guide: Integrating Search into your Application available
,该代码可从其网页上获取。该代码使用org.terrier.realtime.memory.MemoryIndex
,但在terrier jar files
中不可用,而我已经使用maven
将其包含在项目中。
我同时检查了Terrier
5.1
和5.0
,但是找不到MemoryIndex class
及其构造函数。
import java.io.File;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Iterator;
import org.terrier.indexing.Document;
import org.terrier.indexing.TaggedDocument;
import org.terrier.indexing.tokenisation.Tokeniser;
import org.terrier.querying.LocalManager;
import org.terrier.querying.Manager;
import org.terrier.querying.ManagerFactory;
import org.terrier.querying.ScoredDoc;
import org.terrier.querying.ScoredDocList;
import org.terrier.querying.SearchRequest;
import org.terrier.realtime.memory.MemoryIndex;
import org.terrier.utility.ApplicationSetup;
import org.terrier.utility.Files;
public class IndexingAndRetrievalExample {
public static void main(String[] args) throws Exception {
// Directory containing files to index
String aDirectoryToIndex = "/my/directory/containing/files/";
// Configure Terrier
ApplicationSetup.setProperty("indexer.meta.forward.keys", "docno");
ApplicationSetup.setProperty("indexer.meta.forward.keylens", "30");
// Create a new Index
MemoryIndex memIndex = new MemoryIndex();
// For each file
for (String filename : new File(aDirectoryToIndex).list() ) {
String fullPath = aDirectoryToIndex+filename;
// Convert it to a Terrier Document
Document document = new TaggedDocument(Files.openFileReader(fullPath), new HashMap(), Tokeniser.getTokeniser());
// Add a meaningful identifier
document.getAllProperties().put("docno", filename);
// index it
memIndex.indexDocument(document);
}
// Set up the querying process
ApplicationSetup.setProperty("querying.processes", "terrierql:TerrierQLParser,"
+ "parsecontrols:TerrierQLToControls,"
+ "parseql:TerrierQLToMatchingQueryTerms,"
+ "matchopql:MatchingOpQLParser,"
+ "applypipeline:ApplyTermPipeline,"
+ "localmatching:LocalManager$ApplyLocalMatching,"
+ "filters:LocalManager$PostFilterProcess");
// Enable the decorate enhancement
ApplicationSetup.setProperty("querying.postfilters", "org.terrier.querying.SimpleDecorate");
// Create a new manager run queries
Manager queryingManager = ManagerFactory.from(memIndex.getIndexRef());
// Create a search request
SearchRequest srq = queryingManager.newSearchRequestFromQuery("search for document");
// Specify the model to use when searching
srq.setControl(SearchRequest.CONTROL_WMODEL, "BM25");
// Enable querying processes
srq.setControl("terrierql", "on");
srq.setControl("parsecontrols", "on");
srq.setControl("parseql", "on");
srq.setControl("applypipeline", "on");
srq.setControl("localmatching", "on");
srq.setControl("filters", "on");
// Enable post filters
srq.setControl("decorate", "on");
// Run the search
queryingManager.runSearchRequest(srq);
// Get the result set
ScoredDocList results = srq.getResults();
// Print the results
System.out.println("The top "+results.size()+" of documents were returned");
System.out.println("Document Ranking");
for(ScoredDoc doc : results) {
int docid = doc.getDocid();
double score = doc.getScore();
String docno = doc.getMetadata("docno")
System.out.println(" Rank "+i+": "+docid+" "+docno+" "+score);
}
}
}
答案 0 :(得分:1)
我发现了问题所在。问题在于设置Maven依赖项。这是在构建Maven项目时通过添加以下依赖项来解决问题的方法:
<dependencies>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-core</artifactId>
<version>5.1</version>
</dependency>
<dependency>
<groupId>org.terrier</groupId>
<artifactId>terrier-realtime</artifactId>
<version>5.1</version>
</dependency>
</dependencies>
答案 1 :(得分:0)
类MemoryIndex.java
似乎是Terrier-Core版本4.4的一部分。更多信息:https://jar-download.com/artifacts/org.terrier/terrier-core/4.4/source-code/org/terrier/realtime/memory/MemoryIndex.java
他们的文档似乎已经过时了。