我想在我的java项目中集成Apache Tika。我需要从不同的文件格式(excel,doc,ppt等等)获取文本 经过一些阅读后,我了解到构建tika的唯一方法是下载src文件并使用Maven构建它。我执行" mvn install"在Tika src(apache-tika-0.9-src)的根目录中,但是我收到了这个错误:
[INFO] Scanning for projects...
Downloading: http://repo1.maven.org/maven2/org/apache/apache/6/apache-6.pom
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR] The project org.apache.tika:tika:0.9 (C:\Users\vexler\Documents\Instal
ls\apache-tika-0.9-src\apache-tika-0.9\pom.xml) has 1 error
[ERROR] Non-resolvable parent POM for org.apache.tika:tika-parent:0.9: Could
not transfer artifact org.apache:apache:pom:6 from/to central (http://repo1.mav
en.org/maven2): Error transferring file: Connection timed out: connect and 'pare
nt.relativePath' points at no local POM @ org.apache.tika:tika-parent:0.9, C:\Us
ers\vexler\Documents\Installs\apache-tika-0.9-src\apache-tika-0.9\tika-parent\po
m.xml, line 25, column 11 -> [Help 2]
我真的对这个错误有任何帮助。 谢谢 :-) Reuth
答案 0 :(得分:1)
假设您在项目中使用Maven,那么生活就会简单得多
只需添加类似
的内容<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>0.9</version>
<scope>provided</scope>
</dependency>
然后Maven会下载Tika,它依赖于你
或者,如果您下载最新的Tika OSGi Bundle Jar(例如0.9)并解压缩,那么您将获得Tika依赖项和代码