通过lucene将RDBMS表数据存储在硬盘上的文本文件中

时间:2012-04-06 12:47:58

标签: lucene

我想使用lucene在文本文件中存储320万条记录的RDBMS sql查询结果然后搜索。 [我在这里看到了例子how to integrate RAMDirectory into FSDirectory in lucene

[1]:how to integrate RAMDirectory into FSDirectory in lucene。我有这段代码正在为我工​​作

  public class lucetest {
        public static void main(String args[]) {
            lucetest lucetestObj = new lucetest();
            lucetestObj.main1(lucetestObj);
        }

        public void main1(lucetest lucetestObj) {
            final File INDEX_DIR = new File(
                    "C:\\Documents and Settings\\44444\\workspace\\lucenbase\\bin\\org\\lucenesample\\index");

            try {
                Connection conn;
                Class.forName("com.teradata.jdbc.TeraDriver").newInstance();
                conn = DriverManager.getConnection(
                        "jdbc:teradata://x.x.x.x/CHARSET=UTF16", "aaa", "bbb");
                StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

//              Directory index = new RAMDirectory(); //To use RAM space
Directory index = FSDirectory.open(INDEX_DIR); //To use Hard disk,This will not consume RAM

                IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
                        analyzer);
                IndexWriter writer = new IndexWriter(index, config);

                // IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
                System.out.println("Indexing to directory '" + INDEX_DIR + "'...");

                lucetestObj.indexDocs(writer, conn);
                writer.optimize();
                writer.close();
                System.out.println("pepsi");
                lucetestObj.searchDocs(index, analyzer, "india");
                try {
                    conn.close();
                } catch (SQLException e2) {
                    // TODO Auto-generated catch block
                    e2.printStackTrace();
                }
            } catch (Exception e) {
                e.printStackTrace();

            } finally {

            }

        }

        void indexDocs(IndexWriter writer, Connection conn) throws Exception {
            String sql = "select id, name, color from pet";

            String queryy = "  SELECT  CFMASTERNAME, " + "  ULTIMATEPARENTID,"
                    + "ULTIMATEPARENT, LONG_NAMEE FROM  XCUST_SRCH_SRCH"
                    + "sample 100000;";
            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(queryy);
            int kk = 0;
            while (rs.next()) {
                Document d = new Document();
                d.add(new Field("id", rs.getString("CFMASTERID"), Field.Store.YES,
                        Field.Index.NO));
                d.add(new Field("name", rs.getString("CFMASTERNAME"),
                        Field.Store.YES, Field.Index.ANALYZED));
                d.add(new Field("color", rs.getString("LONG_NAMEE"),
                        Field.Store.YES, Field.Index.ANALYZED));
                writer.addDocument(d);
            }
            if (rs != null) {
                rs.close();
            }
        }

        void searchDocs(Directory index, StandardAnalyzer analyzer,
                String searchstring) throws Exception {

            String querystr = searchstring.length() > 0 ? searchstring : "lucene";
            Query q = new QueryParser(Version.LUCENE_35, "name", analyzer)
                    .parse(querystr);

            int hitsPerPage = 10;
            IndexReader reader = IndexReader.open(index);
            IndexSearcher searcher = new IndexSearcher(reader);
            TopScoreDocCollector collector = TopScoreDocCollector.create(
                    hitsPerPage, true);
            searcher.search(q, collector);
            ScoreDoc[] hits = collector.topDocs().scoreDocs;
            System.out.println("Found " + hits.length + " hits.");
            for (int i = 0; i < hits.length; ++i) {
                int docId = hits[i].doc;
                Document d = searcher.doc(docId);
                System.out.println((i + 1) + ".CFMASTERNAME " + d.get("name")
                        + " ****LONG_NAMEE**" + d.get("color") + "****ID******"
                        + d.get("id"));
            }

            searcher.close();
        }
    }

如何格式化这段代码,使sql结果表不是RAM目录,而是保存在指定路径的硬盘上。我无法解决问题。我的要求是这个表数据存储在磁盘上lucene返回结果非常快。因此我通过索引的lucene将数据保存在磁盘上。

1 个答案:

答案 0 :(得分:1)

Directory index = FSDirectory.open(INDEX_DIR);

您提到将sql结果保存到文本文件,但这是不必要的开销。在迭代ResultSet时,将行直接保存到Lucene索引。

顺便说一下,并不重要,但是在所有上限中命名你的本地var(最终或其他)是违反惯例的。使用camelCase。所有大写仅适用于类级常量(类的静态最终成员)。