我是SolrJ的新手。我需要使用SolrJ Java API为zip,pdf和html文档编制索引。谁能给我一些例子,在Java应用程序中使用SolrJ为不同类型的文档建立索引?
我可以通过任何良好的链接来找到Java中的好示例来索引文件夹中可用的不同类型的文档...
谢谢您的帮助。
根据输出,很明显solrj没有索引我正在尝试的.xml文件,任何人都可以评论我在做什么错了...
代码:
String urlString = "http://localhost:8983/solr/tests";
HttpSolrClient solr = new HttpSolrClient.Builder(urlString).build();
solr.setParser(new XMLResponseParser());
File file = new File("D:/work/devtools/Solr/solr-7.6.0/example/exampledocs/hd.xml");
InputStream fis = new FileInputStream(file);
/* Tika specific */
ContentHandler contenthandler = new BodyContentHandler(10 * 1024 * 1024);
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, "hd.xml");
ParseContext parseContext = new ParseContext();
// Automatically detect best parser base on detected document type
AutoDetectParser autodetectParser = new AutoDetectParser();
// OOXMLParser parser = new OOXMLParser();
autodetectParser.parse(fis, contenthandler, metadata, parseContext);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", file.getCanonicalPath());
SolrQuery query = new SolrQuery("*.*");
// query.set("q", "price:599.99");
QueryResponse response = solr.query(query);
输出:
solr query{responseHeader={status=0,QTime=0,params={q=*.*,wt=xml,version=2.2}},response={numFound=0,start=0,docs=[]}}
答案 0 :(得分:0)
基本信息链接:https://www.youtube.com/watch?v=rxoS1p1TaFY&t=198s 2)https://lucene.apache.org/solr/链接下载最新版本 如何在Java应用程序中使用solrj: java版本应为1.8 @)下载solr最新版本解压缩 1)在您的pom.xml文件中添加依赖项 org.apache.solr solr-solrj 7.6.0
**从solr / bin文件夹启动solr并通过单击此http://localhost:8983/solr/#检查solr管理控制台 2) 基本示例代码:(此代码足以理解solrj)
create the indexfiles core in solr and use the following code
String urlString = "http://localhost:8983/solr/indexfiles";
HttpSolrClient solr = new HttpSolrClient.Builder(urlString).build();
solr.setParser(new XMLResponseParser());
File file = new File("D:/work/devtools/Solr/solr-7.6.0/example/exampledocs/176444.zip");
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
// req.addFile(file, "application/pdf");//change the content type for different input files
req.addFile(file, "text/plain");
String fileName = file.getName();
req.setParam("literal.id", fileName);
req.setAction(req.getAction().COMMIT, true, true);
NamedList<Object> result = solr.request(req);
int status = (Integer) ((org.apache.solr.common.util.SimpleOrderedMap) (result.get("responseHeader"))).get("status");
System.out.println("Result: " +result);
System.out.println("solr query"+ solr.query(new SolrQuery("*.*")));
3)query from the solr admin console using this http://localhost:8983/solr/indexfiles/select?q=SOLR1000
just change the text(q="<text to search>") that u want to search that available in the files that u indexed
u can find query parameter q in the solr admin console where we can give the required text to search if u are not comfortable with solr querys by default it is *:*
NOTE:dont need to think about Apache Tika and all to integrate it with Apache solr to index zip files and all because its by default available in solr new version
****Note: dont confuse by looking into the outputs from standalone admin(which gives complete data in the output ex: hd.xml is indexed which is available in the /exampledocs folder in solr) and the output u get by indexing the same files using solrj through java application
ex:solrj it will just index the file which means from the solr admin console u can see the following as out put when u fire query
(http://localhost:8983/solr/indexfiles/select?q=*:*)
output:
{
"id":"hd.xml",
"stream_size":["null"],
"x_parsed_by":["org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.xml.DcXMLParser"],
"stream_content_type":["text/xml"],
"content_type":["application/xml"],
"_version_":1624155471570010112},
But if we index throw command prompt using ---> java -Dc=name -jar post.jar *.xml the output contains the data available inside the xml file (http://localhost:8983/solr/indexfiles/select?q=*:*)
答案 1 :(得分:0)
Xml用于将xml文件索引到Solr中的代码的特定版本。但是Xml应该采用以下格式。
<add>
<doc>
<field name="id">PMID</field>
<field name="year_i">Year</field>
<field name="name">ArticleTitle</field>
<field name="abstract_s">AbstractText</field>
<field name="cat">MeshHeading1</field>
<field name="cat">MeshHeading2</field>
</doc>
</add>
下面是将xml数据索引到Solr的代码。
File xmlFile = new File("example.xml");
Reader fileReader = new FileReader(xmlFile);
BufferedReader bufReader = new BufferedReader(fileReader);
StringBuilder sb = new StringBuilder();
String line = bufReader.readLine();
while( line != null){
sb.append(line).append("\n");
line = bufReader.readLine();
}
String xml2String = sb.toString();
String urlString = String.format("http://localhost:8983/solr/%s", "pubmed1");
HttpSolrClient server = new HttpSolrClient.Builder(urlString).build();
server.setParser(new XMLResponseParser());
DirectXmlRequest xmlreq = new DirectXmlRequest( "/update", xml2String );
server.request( xmlreq );
server.commit();
谈论Apache Tika,它将帮助您提取文件内容。该文件可以是xlsx,pdf,html,xml。如果是xml文件格式,则需要编写解析器以将solr xml格式的xml格式转换。如果是xml,则可以使用XSLT。 如果是Apache Tika,请参考: enter link description here