我正在使用Crawler控制器来抓取中型网站的所有页面。它随机抓取2-3页,然后锁定IndexWriter
Directory dir = FSDirectory.open(new File(index));
IndexWriterConfig conf = new IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_41,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_41));
writer = new IndexWriter(dir, conf); // line which throws lock exception.
日志:
From:SiteSearch.KCCrawlerController。(80):Lock获取超时:NativeFSLock @ D:\ Websites \ ccc \ WEB-INF \ lucene-index \ en \ write.lock:05/08/2014 10:57: 55 org.apache.lucene.store.LockObtainFailedException:Lock获取超时:NativeFSLock @ D:\ Websites \ ccc \ WEB-INF \ lucene-index \ en \ write.lock 在org.apache.lucene.store.Lock.obtain(Lock.java:84) 在org.apache.lucene.index.IndexWriter。(IndexWriter.java:636) 在SiteSearch.KCCrawlerController。(KCCrawlerController.java:80) 在org.apache.jsp.monitors.siteSearchIndexer_jsp._jspService(siteSearchIndexer_jsp.java:66) 在org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) 在javax.servlet.http.HttpServlet.service(HttpServlet.java:717) 在org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:386) 在org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) 在org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) 在javax.servlet.http.HttpServlet.service(HttpServlet.java:717) 在org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) 在org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.tridion.ambientdata.web.AmbientDataServletFilter.doFilter(AmbientDataServletFilter.java:255) 在org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 在org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at adminV3.ugc.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) 在org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 在org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 在org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 在org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 在org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 在org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 在org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:429) at org.apache.coyote.ajp.AjpAprProtocol $ AjpConnectionHandler.process(AjpAprProtocol.java:384) 在org.apache.tomcat.util.net.AprEndpoint $ Worker.run(AprEndpoint.java:1665) 在java.lang.Thread.run(未知来源)
添加jsp: http://example.com/en/consulting/diagnostics.jsp?crawler=yes
来自:SiteSearch.KCCrawler.visit(95):流关闭:05/08/2014 10:57:55 java.io.IOException:Stream关闭了 org.apache.jasper.runtime.JspWriterImpl.ensureOpen(JspWriterImpl.java:204) 在 org.apache.jasper.runtime.JspWriterImpl.write(JspWriterImpl.java:312) 在 org.apache.jasper.runtime.JspWriterImpl.write(JspWriterImpl.java:342) 在SiteSearch.KCCrawler.visit(KCCrawler.java:95)at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:306) 在edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:189) 在java.lang.Thread.run(未知来源)
为什么我会收到此异常?任何帮助。
当我第一次运行Indexer时,它会成功完成并抛出以下异常。如果我对此进行搜索,我会成功获得结果。但是,如果我再次运行Indexer,它会抛出上面提到的锁定异常。它还显示我的控制器类被调用两次。
org.apache.catalina.core.StandardWrapperValve调用SEVERE: servlet jsp的Servlet.service()引发了异常java.io.IOException: 小溪关闭了 org.apache.jasper.runtime.JspWriterImpl.ensureOpen(JspWriterImpl.java:204) 在 org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:115) 在 org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:188) 在 org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:118) 在 org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:77) 在 org.apache.jsp.monitors.siteSearchIndexer_jsp._jspService(siteSearchIndexer_jsp.java:82) 在org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) 在javax.servlet.http.HttpServlet.service(HttpServlet.java:717)at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:386)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
答案 0 :(得分:0)
引用Javadocs -
"打开IndexWriter会为正在使用的目录创建一个锁文件。尝试在同一目录上打开另一个IndexWriter将导致LockObtainFailedException。如果使用同一目录中的IndexReader从索引中删除文档,也会引发LockObtainFailedException。"
" IndexWriter实例完全是线程安全的,这意味着多个线程可以同时调用其任何方法。如果您的应用程序需要外部同步,则不应在IndexWriter实例上进行同步,因为这可能会导致死锁;使用你自己的(非Lucene)对象。"
https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/IndexWriter.html
您是否为要抓取的每个网页创建了IndexWriter的新实例?