Question

我的一些同事有一个大型Java Web应用程序，它使用了一个使用Lucene Java构建的搜索系统。我想做的是有一个很好的基于HTTP的API来访问那些现有的搜索索引。我之前使用过Nutch，并且非常喜欢OpenSearch实现如何简单地将结果作为RSS获取。

我已经尝试在solrconfig.xml中设置Solr的dataDir，希望它能很好地获取现有的索引文件，但它似乎只是忽略它们。

我的主要问题是：

可以使用Solr访问在别处创建的Lucene索引吗？或者可能有更好的解决方案？

Answer 1

成功！随着Pascal建议对schema.xml进行更改，我立即开始工作。谢谢！

以下是感兴趣的任何人的完整步骤：

下载Solr并将dist / apache-solr-1.4.0.war复制到tomcat / webapps
将示例/ solr / conf复制到/ usr / local / solr /
将预先存在的Lucene索引文件复制到/ usr / local / solr / data / index
将solr.home设置为/ usr / local / solr
在solrconfig.xml中，将dataDir更改为/ usr / local / solr / data（Solr查找里面的索引目录）
将我的Lucene索引加载到Luke中进行浏览（非常棒的工具）
在示例schema.xml中，删除了除“string”
在示例schema.xml中，添加了14个字段定义，对应于Luke中显示的14个字段。示例：<field name="docId" type="string" indexed="true" stored="true"/>
在示例schema.xml中，将uniqueKey更改为索引中似乎是文档ID的字段
在示例schema.xml中，将defaultSearchField更改为索引中似乎包含术语
启动tomcat，最后没有看到异常，并在localhost中成功运行了一些查询：8080 / solr / admin

这只是证明它可以工作的证明。显然，还有很多配置需要完成。

Answer 2

我从未尝试过这个，但您必须调整schema.xml以包含Lucene索引中文档的所有字段，因为如果不是，Solr将不允许您搜索字段在schema.xml中定义。

对schema.xml的调整还应包括定义查询时分析器以在您的字段中正确搜索，尤其是在使用自定义分析器索引的字段时。

在solrconfig.xml中，您可能需要更改indexDefaults和mainIndex部分中的设置。

但我很乐意阅读实际做过的人的答案。

Answer 3

最后的三个步骤：

更改schema.xml或（managed-schema）
更改＆lt; dataDir＆gt;在solrconfig.xml
重启Solr

对于那些刚接触Solr的人，我有学习笔记here，像我一样:) 要自己生成一些lucene索引，可以使用我的代码here。

public class LuceneIndex {
    private static Directory directory;

    public static void main(String[] args) throws IOException {
        long startTime = System.currentTimeMillis();

        // open
        Path path = Paths.get("/tmp/myindex/index");
        directory = new SimpleFSDirectory(path);
        IndexWriter writer = getWriter();

        // index
        int documentCount = 10000000;
        List<String> fieldNames = Arrays.asList("id", "manu");

        FieldType myFieldType = new FieldType();
        myFieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
        myFieldType.setOmitNorms(true);
        myFieldType.setStored(true);
        myFieldType.setTokenized(true);
        myFieldType.freeze();

        for (int i = 0; i < documentCount; i++) {
            Document doc = new Document();
            for (int j = 0; j < fieldNames.size(); j++) {
                doc.add(new Field(fieldNames.get(j), fieldNames.get(j) + Integer.toString(i), myFieldType));
            }
            writer.addDocument(doc);
        }
        // close
        writer.close();
        System.out.println("Finished Indexing");
        long estimatedTime = System.currentTimeMillis() - startTime;
        System.out.println(estimatedTime);
    }
    private static IndexWriter getWriter() throws IOException {
        return new IndexWriter(directory, new IndexWriterConfig(new WhitespaceAnalyzer()));
    }
}

Answer 4

I am trying the same steps with HDF as the home directory and locktype as HDFS but no luck. I see the below error

labs_shard1_replica1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://127.0.0.1/user/solr/labs/core_node1/data/index/' of core 'labs_shard1_replica1' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs

solar dir config

<directoryFactory name="DirectoryFactory"

类=＃＆34; $ {solr.directoryFactory：solr.NRTCachingDirectoryFactory}＆＃34;＆GT;

但不是HDFS，如下所示

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
                <str name="solr.hdfs.home">hdfs://127.0.0.1/user/solr</str>
                <bool name="solr.hdfs.blockcache.enabled">true</bool>
                <int name="solr.hdfs.blockcache.slab.count">1</int>
                <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
                <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
                <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
                <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
                <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
                <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
                <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
            </directoryFactory>

锁定类型 HDFS

Solr可以加载原始Lucene索引吗？

4 个答案: