目前处于锁定区域:Fuseki +全文搜索+推断

时间:2013-08-22 16:10:35

标签: sparql jena semantic-web arq fuseki

我最近开始使用Fuseki 0.2.8快照中的全文搜索。

我有一个由TDB数据集支持的InfModel,我已经添加了一个Lucene文本索引。我已经使用这样的搜索查询对其进行了测试:

prefix text: <http://jena.apache.org/text#>
select distinct ?s where { ?s text:query ('stu' 16) }

这很好用,直到我和Fuseki有两个或两个以上的同时查询,然后偶尔会得到:

Error 500: Currently in a locked region Fuseki - version 0.2.8-SNAPSHOT (Build date: 20130820-0755). 

我尝试用10个并发用户以随机间隔发送查询来测试端点,在两分钟内,大约30%的查询返回上面的500错误。

我还尝试通过替换此部分(下面的完整汇编程序文件)来禁用推理:

<#dataset_fulltext> rdf:type     text:TextDataset ;
  text:dataset   <#dataset_inf> ;
  ##text:dataset   <#tdbDataset> ;
  text:index     <#indexLucene> .

用这个:

<#dataset_fulltext> rdf:type     text:TextDataset ;
  ##text:dataset   <#dataset_inf> ;
  text:dataset   <#tdbDataset> ;
  text:index     <#indexLucene> .

并且TextDataset使用#tdDDataset而不是#dataset_inf时没有生成异常。

我的设置有问题,或者这是Fuseki的错误吗?

这是我当前的汇编程序文件:

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix dc:      <http://purl.org/dc/terms/> .

[] rdf:type fuseki:Server ;
  # Timeout - server-wide default: milliseconds.
  # Format 1: "1000" -- 1 second timeout
  # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.
  # See java doc for ARQ.queryTimeout
  ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "12000,50000" ] ;

  fuseki:services (
    <#service1>
  ) .

# Custom code.
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .

# TDB
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## Service with only SPARQL query on an inference model.
## Inference model bbase data in TDB.

<#service1>  rdf:type fuseki:Service ;
  rdfs:label               "TDB/text service" ;
  fuseki:name              "dataset" ;         # http://host/dataset
  fuseki:serviceQuery      "query" ;
  fuseki:serviceUpdate     "update" ;
  fuseki:serviceUpload     "upload" ;
  fuseki:serviceReadWriteGraphStore "data" ;
  fuseki:serviceReadGraphStore "get" ;
  fuseki:dataset           <#dataset_fulltext> ;
    .

<#dataset_inf> rdf:type ja:RDFDataset ;
  ja:defaultGraph       <#model_inf> .

<#model_inf> rdf:type ja:Model ;
  ja:baseModel <#tdbGraph> ;
  ja:reasoner [ ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner> ] .

<#tdbDataset> rdf:type tdb:DatasetTDB ;
  tdb:location "Data" .
<#tdbGraph> rdf:type tdb:GraphTDB ;
  tdb:dataset <#tdbDataset> .

# Dataset with full text index.
<#dataset_fulltext> rdf:type     text:TextDataset ;
  text:dataset   <#dataset_inf> ;
  ##text:dataset   <#tdbDataset> ;
  text:index     <#indexLucene> .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
  text:directory <file:Lucene> ;
  ##text:directory "mem" ;
  text:entityMap <#entMap> ;
  .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
  text:entityField      "uri" ;
  text:defaultField     "text" ;
  text:map (
    [ text:field "text" ; text:predicate dc:title ]
    [ text:field "text" ; text:predicate dc:description ]
  ) .

以下是Fuseki日志中其中一个例外的完整堆栈跟踪:

16:27:01 WARN  Fuseki               :: [2484] RC = 500 : Currently in a locked region
com.hp.hpl.jena.sparql.core.DatasetGraphWithLock$JenaLockException: Currently in a locked region
    at com.hp.hpl.jena.sparql.core.DatasetGraphWithLock.checkNotActive(DatasetGraphWithLock.java:72)
    at com.hp.hpl.jena.sparql.core.DatasetGraphTrackActive.begin(DatasetGraphTrackActive.java:44)
    at org.apache.jena.query.text.DatasetGraphText.begin(DatasetGraphText.java:102)
    at org.apache.jena.fuseki.servlets.HttpAction.beginRead(HttpAction.java:117)
    at org.apache.jena.fuseki.servlets.SPARQL_Query.execute(SPARQL_Query.java:236)
    at org.apache.jena.fuseki.servlets.SPARQL_Query.executeWithParameter(SPARQL_Query.java:195)
    at org.apache.jena.fuseki.servlets.SPARQL_Query.perform(SPARQL_Query.java:80)
    at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeLifecycle(SPARQL_ServletBase.java:185)
    at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.executeAction(SPARQL_ServletBase.java:166)
    at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.execCommonWorker(SPARQL_ServletBase.java:154)
    at org.apache.jena.fuseki.servlets.SPARQL_ServletBase.doCommon(SPARQL_ServletBase.java:73)
    at org.apache.jena.fuseki.servlets.SPARQL_Query.doGet(SPARQL_Query.java:61)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1448)
    at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
    at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:370)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
    at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
    at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
    at org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.run(BlockingChannelConnector.java:298)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:722)

任何建议表示赞赏。

谢谢, 斯图尔特。

1 个答案:

答案 0 :(得分:1)

这看起来可能是我提交为JENA-522的错误,如果您有关于要添加的错误的更多详细信息,请在那里添加评论。

问题是带有推理的数据集隐式使用ARQ的标准内存中Dataset实现,这不支持事务。

然而,在内部(和堆栈跟踪中)对应DatasetGraphText的文本数据集要求包装数据集支持事务,并且不用DatasetGraphWithLock包装它们。这似乎是遇到锁的问题,文档声明这应该支持多个读者,但遵循代码的逻辑我不确定它实际上允许这个。