Question

我使用Apache Solr来处理文件，可以通过Spring添加常规文本字段，但是我不知道如何添加TXT / pdf

@SolrDocument(solrCoreName = "accounting")
public class Accounting {
@Id
@Field
private String id;
@Field
private File txtFile;
@Field
private String docType;
@Field
private String docTitle;

public Accounting() {
}

public Accounting(String id, String docType, String docTitle) {
    this.id = id;
    this.docTitle = docTitle;
    this.docType = docType;
}

这是txtFile字段的问题

   <field name="docTitle" type="strings"/>
  <field name="docType" type="strings"/>

我手动添加到schema.xml的这些字段，我不知道如何在此处添加将负责文件的字段，例如，我将在此处添加txt文件，该怎么做？非常感谢你。我是否在文件实体中正确声明了字段private File txtFile;？

Answer 1

Solr不会在任何地方存储实际文件。但是，根据您的配置，它可以存储二进制内容。使用依赖于Apache Tika的提取请求处理程序Apache Solr从文档中提取内容。

您可以尝试以下代码。当前代码未使用springboot中的任何内容。在这里，内容是从pdf文档中读取的，然后将数据与id和文件名一起索引到solr中。我已使用tika api提取了pdf的内容。

public static void main(final String[] args) throws IOException, TikaException, SAXException {

        String urlString = "http://localhost:8983/solr/TestCore1";
        SolrClient solr = new HttpSolrClient.Builder(urlString).build();

        BodyContentHandler handler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        File file = new File("C://Users//abhijitb//Desktop//TestDocument.pdf");
        FileInputStream inputstream = new FileInputStream(file);
        ParseContext pcontext = new ParseContext();

        // parsing the document using PDF parser
        PDFParser pdfparser = new PDFParser();
        pdfparser.parse(inputstream, handler, metadata, pcontext);

        // getting the content of the document
        //System.out.println("Contents of the PDF :" + handler.toString());

        try {
            String fileName = file.getName();
            SolrInputDocument document = new SolrInputDocument();
            document.addField("id", "123456");
            document.addField("title", fileName);
            document.addField("text", handler.toString());
            solr.add(document);
            solr.commit();
        } catch (SolrServerException | IOException e) {
            e.printStackTrace();
        }
    }

一旦为数据建立索引，就可以在solr管理员页面上通过查询对其进行验证。请找到图片以供参考。

如何在Solr中添加文件？

1 个答案: