OWLAPI错误地将OBO解析器用于N-Triple文件

时间:2017-05-03 23:39:03

标签: java maven parsing owl-api

我们有一个使用OWLAPI解析本体的包装器。

但对于某些N-Triples本体,当包装器作为jar运行时,OWLAPI解析器失败。

解析后的本体如下:http://www.cropontology.org/ontology/CO_320/Rice/nt

我们正在尝试解析它:https://github.com/ncbo/owlapi_wrapper/blob/master/src/main/java/org/stanford/ncbo/oapiwrapper/OntologyParser.java#L637

我们正面临两个案例:

  • 运行 mvn test 时:解析正常

  • 当通过 jar 运行时:使用OBO解析器,生成一个公理本体,其中整个nt本体包含在一个字符串中oboInOwl:http谓词:
    <oboInOwl:http rdf:datatype="http://www.w3.org/2001/XMLSchema#string">//www.cropontology.org/rdf/CO_320:ROOT&gt; &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt;

在这两种情况下,OWLOntologyLoaderConfiguration和输入文件是相同的。所以唯一的区别是一个使用mvn test运行,另一个运行java -jar(2使用相同的java版本)

我尝试过多种方法:

  • 禁止OBO解析器。我尝试了多种语法,但没有一种方法有效;包装器继续使用OBO解析器)

    conf.setBannedParsers("org.obolibrary.oboformat.parser.OBOFormatParser");
    conf.setBannedParsers("o.o.oboformat.parser.OBOFormatParser");
    conf.setBannedParsers("OBOFormatParser");
    
  • 避免使用不同的owlapi依赖项。就像在此记录OWLAPI: Parser not found if run from Jar一样,我尝试仅使用owlapi-distribution来避免任何冲突

    <dependency>
      <groupId>net.sourceforge.owlapi</groupId>
      <artifactId>owlapi-distribution</artifactId>
      <version>4.3.1</version>
    </dependency>
    

任何人都知道这种不一致可能来自哪里? 为什么OWLAPI loadOntologyFromOntologyDocument在一个案例中正常工作而在另一个案例中是错误的?即使输入完全相同。

UPDATE1:

由于三元组中的某些_:genid1节点,NTriple文件的解析有时会失败。 问题如下:

  • 当应用程序打包为jar(包含依赖项)时,运行jar来解析NTriple文件。然后它无法返回org.semanticweb.owlapi.rdf.turtle.parser.ParseException: Encountered " <PNAME_LN> ":genid1 ""

造成问题的三重奏是:<http://www.cropontology.org/rdf/CO_320:0001563> <http://www.w3.org/2000/01/rdf-schema#subClassOf> _:genid1.

  • 当完全相同的解析器运行时,在完全相同的文件上,通过Maven测试(解析器通过jUnit测试调用,我们使用mvn test运行测试)。然后解析顺利。并且成功提取了_:genid1节点给出的信息。

看起来OWLAPI在第一种情况下无法解析空白节点。 在运行VersionInfo.getVersionInfo()之前,我使用loadOntologyFromOntologyDocument打印了OWLAPI版本:

  • 对于jar版本(导致问题):The OWL API (version 4.3.1)
  • 对于测试版本(正在运行):The OWL API (version 4.3.1.2017-03-27T22:32:37Z)

UPDATE2:

似乎这个问题来自于罐子的建造。

当构建jar时会覆盖一些依赖项,因此并非所有解析器都包含在conf文件中

jar中的org.openrdf.rio.RDFParserFactory仅包含以下内容:

org.semanticweb.owlapi.rio.RioFunctionalSyntaxParserFactory
org.semanticweb.owlapi.rio.RioManchesterSyntaxParserFactory
org.semanticweb.owlapi.rio.RioOWLXMLParserFactory
org.semanticweb.owlapi.rio.RioFunctionalSyntaxParserFactory
org.semanticweb.owlapi.rio.RioManchesterSyntaxParserFactory
org.semanticweb.owlapi.rio.RioOWLXMLParserFactory
  • 当运行测试(解析工作的地方)时,根据日志,本体在org.semanticweb.owlapi.formats.RioTurtleDocumentFormat

  • 通过jar时运行:

对于没有空节点的NTriples文件(因此解析器运行良好),我们得到以下格式:org.semanticweb.owlapi.formats.TurtleDocumentFormat

对于具有空白节点的NTriples文件,我们得到了:

The following parsers were tried:
1) org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@a4add54
2) org.semanticweb.owlapi.owlxml.parser.OWLXMLParser@71454b9d
3) org.semanticweb.owlapi.functional.parser.OWLFunctionalSyntaxOWLParser@67304a40
4) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
5) org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxOntologyParser@61c9c3fd
6) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory@6f9c39ad
7) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory@cd748dc3
8) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory@937ecd36
9) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.TrigDocumentFormatFactory@27e81c
10) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory@3bf24493
11) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory@dcacc47d
12) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.N3DocumentFormatFactory@9a5
13) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory@69b9a3bc
14) org.semanticweb.owlapi.rio.RioTrixParserFactory$TrixParserImpl : org.semanticweb.owlapi.formats.TrixDocumentFormatFactory@27e82d
15) org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@463b4ac8
16) org.semanticweb.owlapi.krss2.parser.KRSS2OWLParser@11981797
17) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory@264e8d

但对于RioTurtleDocumentFormat,它说:

Parser: org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
    Stack trace:
org.openrdf.rio.UnsupportedRDFormatException: No parser factory available for RDF format Turtle (mimeTypes=text/turtle, application/x-turtle; ext=ttl)        
org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:207)

所以似乎RioTurtleDocumentFormatFactory没有正确地包含在jar中。

我们怎样才能确定?它可能来自pom.xml build

UPDATE3:

我只尝试了owlapi-osgidistribution,但我仍然得到完全相同的错误。

我还尝试使用maven-shade-plugin打包jar并得到同样的错误。

禁止OBO解析器后,日志表示它试图使用这些解析器解析文件:

The following parsers were tried:
1) org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@2bb3058
2) org.semanticweb.owlapi.owlxml.parser.OWLXMLParser@6bbe2511
3) org.semanticweb.owlapi.functional.parser.OWLFunctionalSyntaxOWLParser@93cf163
4) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
5) org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxOntologyParser@3d97a632
6) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory@6f9c39ad
7) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory@cd748dc3
8) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory@937ecd36
9) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.TrigDocumentFormatFactory@27e81c
10) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory@3bf24493
11) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory@dcacc47d
12) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.N3DocumentFormatFactory@9a5
13) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory@69b9a3bc
14) org.semanticweb.owlapi.rio.RioTrixParserFactory$TrixParserImpl : org.semanticweb.owlapi.formats.TrixDocumentFormatFactory@27e82d
15) org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@784b990c
16) org.semanticweb.owlapi.krss2.parser.KRSS2OWLParser@13f17eb4
17) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory@264e8d

以下是RioTurtleDocumentFormatFactory的错误日志:

--------------------------------------------------------------------------------
Parser: org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
    Stack trace:
org.openrdf.rio.UnsupportedRDFormatException: No parser factory available for RDF format Turtle (mimeTypes=text/turtle, application/x-turtle; ext=ttl)        org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:207)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:197)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1156)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1112)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1068)
        org.stanford.ncbo.oapiwrapper.OntologyParser.findMasterFile(OntologyParser.java:708)
        org.stanford.ncbo.oapiwrapper.OntologyParser.internalParse(OntologyParser.java:651)
        org.stanford.ncbo.oapiwrapper.OntologyParser.parse(OntologyParser.java:630)
        org.stanford.ncbo.oapiwrapper.OntologyParserCommand.main(OntologyParserCommand.java:51)
No parser factory available for RDF format Turtle (mimeTypes=text/turtle, application/x-turtle; ext=ttl)        org.openrdf.rio.Rio.createParser(Rio.java:198)
        org.semanticweb.owlapi.rio.RioParserImpl.parseDocumentSource(RioParserImpl.java:241)
        org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:191)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:197)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1156)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1112)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1068)
        org.stanford.ncbo.oapiwrapper.OntologyParser.findMasterFile(OntologyParser.java:708)
        org.stanford.ncbo.oapiwrapper.OntologyParser.internalParse(OntologyParser.java:651)
        org.stanford.ncbo.oapiwrapper.OntologyParser.parse(OntologyParser.java:630)

在jar中我们可以找到以下类(没有多个副本):

RioTurtleDocumentFormat.class
RioTurtleDocumentFormatFactory.class
RioTurtleParserFactory.class
RioTurtleStorerFactory.class

在我们得到的META-INF/services目录中:

META-INF/services/org.openrdf.rio.RDFParserFactory
META-INF/services/org.semanticweb.owlapi.io.LegacyOWLParserFactory
META-INF/services/org.semanticweb.owlapi.model.OWLOntologyManagerFactory
META-INF/services/org.semanticweb.owlapi.io.OWLParserFactory
META-INF/services/org.semanticweb.owlapi.model.OWLStorerFactory
META-INF/services/org.semanticweb.owlapi.model.OWLDocumentFormatFactory
META-INF/services/org.openrdf.rio.LanguageHandler
META-INF/services/org.openrdf.rio.DatatypeHandler
META-INF/services/org.openrdf.rio.RDFWriterFactory
META-INF/services/com.fasterxml.jackson.core.JsonFactory
META-INF/services/com.fasterxml.jackson.core.ObjectCodec
META-INF/services/org.apache.commons.logging.LogFactory
META-INF/services/javax.servlet.ServletContainerInitializer

META-INF/services/org.openrdf.rio.RDFParserFactory包含:

org.semanticweb.owlapi.rio.RioFunctionalSyntaxParserFactory
org.semanticweb.owlapi.rio.RioManchesterSyntaxParserFactory
org.semanticweb.owlapi.rio.RioOWLXMLParserFactory
org.semanticweb.owlapi.rio.RioFunctionalSyntaxParserFactory
org.semanticweb.owlapi.rio.RioManchesterSyntaxParserFactory
org.semanticweb.owlapi.rio.RioOWLXMLParserFactory

META-INF/services/org.semanticweb.owlapi.model.OWLDocumentFormatFactory包含

org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory
org.semanticweb.owlapi.formats.N3DocumentFormatFactory
org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory
org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory
org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory
org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory
org.semanticweb.owlapi.formats.TrigDocumentFormatFactory
org.semanticweb.owlapi.formats.TrixDocumentFormatFactory
org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory
org.semanticweb.owlapi.formats.N3DocumentFormatFactory
org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory
org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory
org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory
org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory
org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory
org.semanticweb.owlapi.formats.TrigDocumentFormatFactory
org.semanticweb.owlapi.formats.TrixDocumentFormatFactory

所以org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory实际列在某些META-INF/services文件中,并且这些类包含在jar中。但它似乎仍然无法找到它。

我真的不知道OWLAPI如何定义使用哪个解析器以及在哪里找到它们。

UPDATE4:

当我删除所有排除项时,让包含我仍然有很多从jar中排除的lib然后得到java.lang.NoClassDefFoundError。我不得不添加几个包含来解决这个问题。但它仍然没有解决问题(只是让日志消失)

以下是我使用的插件配置:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>2.3</version>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <artifactSet>
          <includes>
            <include>net.sourceforge.owlapi:owlapi-api</include>
            <include>net.sourceforge.owlapi:owlapi-apibinding</include>
            <include>net.sourceforge.owlapi:owlapi-fixers</include>
            <include>net.sourceforge.owlapi:owlapi-impl</include>
            <include>net.sourceforge.owlapi:owlapi-oboformat</include>
            <include>net.sourceforge.owlapi:owlapi-parsers</include>
            <include>net.sourceforge.owlapi:owlapi-rio</include>
            <include>net.sourceforge.owlapi:owlapi-tools</include>
            <include>commons-cli:*</include>
            <include>commons-io:*</include>
            <include>org.slf4j:*</include>
            <include>net.sourceforge.owlapi:owlapi-osgidistribution</include>
            <include>com.google.inject:*</include>
            <include>javax.inject:*</include>
            <include>com.google.*</include>
            <include>aopalliance:*</include>
            <include>org.openrdf.sesame:*</include>
            <include>org.tukaani:*</include>
            <include>net.sf.trove4j:*</include>
            <include>org.apache.commons:commons-csv</include>
          </includes>
        </artifactSet>
        <transformers>
          <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
            <mainClass>org.stanford.ncbo.oapiwrapper.OntologyParserCommand</mainClass>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
</plugin>

但它并没有改变jar中META-INF/services/org.openrdf.rio.RDFParserFactory文件中的任何内容

可能是因为我需要添加<include>net.sourceforge.owlapi:owlapi-osgidistribution</include>并且这会覆盖RDFParserFactory文件。但是没有包括它我得到了java.lang.NoClassDefFoundError: org/semanticweb/owlapi/model/OWLAnnotationValue

1 个答案:

答案 0 :(得分:0)

这里有一些问题密谋:

  • OWLOntologyLoaderConfiguration是一个不可变的类。 Setters生成一个修改过的对象,而不是对它们被调用的对象的更改。
  • 有两个OBO解析器

要解决此问题,请使用:

conf = conf.setBannedParsers(
    "org.coode.owlapi.obo12.parser.OBO12ParserFactory org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParserFactory");

如果您使用的是OWLAPI 5.1.0,则可以在管理员级别设置禁止:

manager.getOntologyConfigurator().withBannedParsers("...");

另一种只使用您知道的解析器的方法必须用于文档是在本体源上设置格式:

OWLOntologyDocumentSource source = 
    new FileDocumentSource(fileName, new NTriplesDocumentFormat());

这只会使用与所请求格式匹配的解析器,而不是所有可用的解析器,直到一个没有失败。

更新:根错误与尝试使用乌龟解析器解析NTriples有关。应该选择Rio ntriples解析器 - 这是我认为maven测试正在发生的事情。 最有可能的问题是:由于meta-inf / services文件夹中存在问题,因此未包含Rio解析器或其声明被跳过。检查jar中是否有多个存储列表的副本;只加载一个(多个文件夹中的多个副本或多个副本适用于ServiceLoader,但同一个jar中的多个副本不是)。

第二次更新:您的POM有多个包含解析器列表的jar。尝试替换

<dependency>
  <groupId>net.sourceforge.owlapi</groupId>
  <artifactId>owlapi-distribution</artifactId>
  <version>4.3.1</version>
</dependency>

<dependency>
  <groupId>net.sourceforge.owlapi</groupId>
  <artifactId>owlapi-rio</artifactId>
  <version>4.3.1</version>
</dependency>

<dependency>
  <groupId>net.sourceforge.owlapi</groupId>
  <artifactId>owlapi-compatibility</artifactId>
  <version>4.3.1</version>
</dependency>

<dependency>
  <groupId>net.sourceforge.owlapi</groupId>
  <artifactId>owlapi-osgidistribution</artifactId>
  <version>4.3.1</version>
</dependency>

更新:我已经检查了用于复制问题的项目。为了调试它,我尝试一次添加一个OWLAPI依赖项,直到错误停止发生 - 这样做,我发现这个文件的内容:

META-INF/services/org.openrdf.rio.RDFParserFactory

应包括

org.openrdf.rio.turtle.TurtleParserFactory

但它没有 - 它包含owlapi-distribution/META-INF/services/org.openrdf.rio.RDFParserFactory的内容。

但是,当您将此作为maven项目运行时,使用maven解析的依赖项,文件org.openrdf.rio.RDFParserFactory出现两次:一次在owlapi-distribution中,一次在sesame-rio-turtle中(在这种情况下为2.7.16版) );第二个文件包含正确的工厂。

问题在于,在重新打包owlapi-distribution及其依赖项时,服务中的文件未按预期合并。

您应该可以通过在重新包装中使用阴影插件来解决这个问题。举个例子,我在这里粘贴了owlapi-distribution所做的事情 - 您需要更改排除列表,因为您可能不想排除任何依赖项。

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <artifactSet>
                            <includes>
                                <include>net.sourceforge.owlapi:owlapi-api</include>
                                <include>net.sourceforge.owlapi:owlapi-apibinding</include>
                                <include>net.sourceforge.owlapi:owlapi-fixers</include>
                                <include>net.sourceforge.owlapi:owlapi-impl</include>
                                <include>net.sourceforge.owlapi:owlapi-oboformat</include>
                                <include>net.sourceforge.owlapi:owlapi-parsers</include>
                                <include>net.sourceforge.owlapi:owlapi-rio</include>
                                <include>net.sourceforge.owlapi:owlapi-tools</include>
                            </includes>
                            <excludes>
                                <exclude>org.apache.felix:org.osgi.core</exclude>
                                <exclude>org.openrdf.sesame:*</exclude>
                                <exclude>com.fasterxml.jackson.core:*</exclude>
                                <exclude>com.github.jsonld-java:*</exclude>
                                <exclude>com.fasterxml.jackson.core:*</exclude>
                                <exclude>org.apache.httpcomponents:*</exclude>
                                <exclude>commons-codec:commons-codec:*</exclude>
                                <exclude>org.slf4j:*</exclude>
                                <exclude>org.semarglproject:*</exclude>
                                <exclude>com.google.guava:*</exclude>
                                <exclude>com.google.inject:*</exclude>
                                <exclude>javax.inject:*</exclude>
                                <exclude>aopalliance:*</exclude>
                                <exclude>com.google.inject.extensions:*</exclude>
                                <exclude>com.google.code.findbugs:*</exclude>
                                <exclude>org.slf4j:slf4j-api</exclude>
                                <exclude>commons-io:*</exclude>
                                <exclude>org.tukaani:*</exclude>
                                <exclude>net.sf.trove4j:*</exclude>
                            </excludes>
                        </artifactSet>
                        <transformers>
                            <transformer
                                implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>