我在Lucene(版本4.10.1)中迈出了我的第一步,我目前的目标是从一个100KB大的文件索引文本字段。因为文本不适合字符串,所以我将文件中的文本放入字节数组中。但是当我运行程序时,Lucene说Fields with BytesRef values cannot be indexed
。
所以问题是:如何索引大文本字段?
以下是代码:
public class Main {
public static void main(String[] args) {
try {
Directory indexDir = FSDirectory.open(new File("testIndex"));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_1, analyzer);
IndexWriter indexWriter = new IndexWriter(indexDir, conf);
Path path = Paths.get("text.txt");
byte[] text = Files.readAllBytes(path);
Long startTime = System.currentTimeMillis();
for(int i = 0;i<100;i++) {
Document doc = new Document();
FieldType fieldType = new FieldType();
fieldType.setIndexed(true);
fieldType.setTokenized(true);
fieldType.setStored(true);
fieldType.setOmitNorms(true);
fieldType.setStoreTermVectors(false);
fieldType.setStoreTermVectorOffsets(false);
fieldType.setStoreTermVectorPayloads(false);
fieldType.setStoreTermVectorPositions(false);
Field title = new Field("text"+i, text, fieldType);
doc.add(title);
indexWriter.addDocument(doc);
}
Long endTime = System.currentTimeMillis();
Long elapsedTime = endTime - startTime;
System.out.println("Elapsed Time in Ms: "+elapsedTime);
indexWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:0)
用StringBuilder
解决了这个问题。
代码:
Path path = Paths.get("text.txt");
BufferedReader reader = Files.newBufferedReader(path, Charset.defaultCharset());
StringBuilder stringBuilder = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null) {
stringBuilder.append(line).append("\n");
}
String text = stringBuilder.toString();