试图将Apache Tika 0.9的依赖性从PDFBOX 1.4.0覆盖到PDFBOX 1.6.0

时间:2011-09-21 18:16:37

标签: apache maven pom.xml apache-tika

    <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parsers</artifactId>
                    <version>0.9</version>
                </dependency>

我试图添加下面的依赖项,而不仅仅是tika的依赖项,以覆盖Tika对PDFBOX 1.6.0的依赖性但它不起作用..

<dependency>
                <groupId>org.apache.tika</groupId>
                <artifactId>tika-parsers</artifactId>
                <version>0.9</version>
    <exclusions> 
    <exclusion>
    <groupId>org.apache.pdfbox</groupId>
          <artifactId>pdfbox</artifactId>
          </exclusion>
    </exclusions>
    </dependency> 
    <dependency>
    <groupId>org.apache.pdfbox</groupId>
              <artifactId>pdfbox</artifactId>
              <version>1.6.0</version>
    </dependency>

Tika Parser依赖于PdfBox版本1.4.0。我想将Apache Tika的这种依赖性改为PdfBox版本1.6.0。我怎么能在我的Pom.xml文件中执行此操作。 这是我的pom.xml文件。任何建议将不胜感激。

    <   project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
                <modelVersion>4.0.0</modelVersion>

                <groupId>com.xyz.search</groupId>
                <artifactId>xyzz-crawler4j</artifactId>
                <version>0.0.1-SNAPSHOT</version>
                <packaging>jar</packaging>

                <name>qcom-crawler4j</name>
                <url>http://maven.apache.org</url>

                <properties>
                    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
                </properties>

                <repositories>
                    <repository>
                        <id>repo-for-dsiutils</id>
                        <url>http://ir.dcs.gla.ac.uk/~bpiwowar/maven/</url>
                    </repository>
            <repository>
                    <id>JBoss</id>
                    <name>jboss-maven2-release-repository</name>
                    <url>https://oss.sonatype.org/content/repositories/JBoss</url>
                  </repository>
                    <repository>
                        <id>oracle</id>
                        <url>http://download.oracle.com/maven</url>
                    </repository>

                    <repository>
                        <id>boilerpipe</id>
                        <url>http://boilerpipe.googlecode.com/svn/repo/</url>
                    </repository>
                </repositories>

                <dependencies>

                    <dependency>
                        <groupId>org.apache.httpcomponents</groupId>
                        <artifactId>httpclient</artifactId>
                        <version>4.0.1</version>
                        <!-- 4.1.1 -->
                    </dependency>

//PDFBOX version 1.6.0
                        <dependency>
                      <groupId>org.apache.pdfbox</groupId>
                      <artifactId>pdfbox</artifactId>
                      <version>1.6.0</version>
                    </dependency>

                    <dependency>
                        <groupId>org.apache.httpcomponents</groupId>
                        <artifactId>httpcore</artifactId>
                        <version>4.0.1</version>
                    </dependency>
                    <!-- 4.1 -->

                    <dependency>
                        <groupId>it.unimi.dsi</groupId>
                        <artifactId>fastutil</artifactId>
                        <version>6.2.2</version>
                    </dependency>


                    <dependency>
                        <groupId>com.sleepycat</groupId>
                        <artifactId>je</artifactId>
                        <version>4.0.71</version>
                    </dependency>

                    <!-- Boilerpipe -->
                    <dependency>
                        <groupId>de.l3s.boilerpipe</groupId>
                        <artifactId>boilerpipe</artifactId>
                        <version>1.2.0</version>
                    </dependency>
                    <!-- Tika (for non-HTML extractions) -->
                    <dependency>
                        <groupId>org.apache.tika</groupId>
                        <artifactId>tika-core</artifactId>
                        <version>0.9</version>
                    </dependency>

                <dependency>
               <groupId>xerces</groupId>
               <artifactId>xercesImpl</artifactId>
               <version>2.8.1</version>
            </dependency>

            <dependency>
                    <groupId>nekohtml</groupId>
                    <artifactId>nekohtml</artifactId>
                    <version>0.6.5</version>
                  </dependency>


                    <dependency>
                        <groupId>org.apache.tika</groupId>
                        <artifactId>tika-parsers</artifactId>
                        <version>0.9</version>
                    </dependency>
    **// I was trying to add this below dependency instead of just above dependency of tika to override the dependency of Tika to PDFBOX 1.6.0 But its not working..

     <!--   <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parsers</artifactId>
                    <version>0.9</version>
        <exclusions> 
        <exclusion>
        <groupId>org.apache.pdfbox</groupId>
              <artifactId>pdfbox</artifactId>
              </exclusion>
        </exclusions>
        </dependency> 
        <dependency>
        <groupId>org.apache.pdfbox</groupId>
                  <artifactId>pdfbox</artifactId>
                  <version>1.6.0</version>
        </dependency>
    -->**


                </dependencies>
            </project>

1 个答案:

答案 0 :(得分:4)

最干净的方法可能是添加一个dependencyManagement部分,用于升级依赖关系树中的PDFBox版本。例如:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>1.6.0</version>
    </dependency>
  </dependencies>
</dependencyManagement>

请注意,许多Tika解析器与PDFBox等上游解析器库的特定版本紧密相关,因此如果您覆盖此类依赖项版本,则需要对系统进行测试。

强制依赖版本更改的替代方法是使用Tika的最新主干版本,其中PDFBox依赖关系已经是版本1.6.0。此外,将使用更新后的依赖关系的Tika 0.10版本应该在下周初就已经发布了。