我正在创建一个简单的helloworld hadoop项目。我真的不知道要包含什么来解决这个错误。似乎hadoop库需要一些我不包括的资源。
我尝试将以下参数添加到运行配置中..但它没有帮助解决问题..
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
这是我的代码:
/**
* Writes a static string to a file using the Hadoop Libraries
*/
public class WriteToFile {
public static void main(String[] args) {
//String to print to file
final String HELLOWORLD = "Hello World! This is Chris writing to the file.";
try {
//Instantiating the configuration
Configuration conf = new Configuration();
//Creating the file system
FileSystem fs = FileSystem.get(conf);
//Instantiating the path
Path path = new Path("/user/c4511/homework1.txt");
//Checking for the existence of the file
if(fs.exists(path)){
//delete if it already exists
fs.delete(path, true);
}
//Creating an output stream
FSDataOutputStream fsdos = fs.create(path);
//Writing helloworld static string to the file
fsdos.writeUTF(HELLOWORLD);
//Closing all connection
fsdos.close();
fs.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
导致此问题的原因是什么?
这是我得到的错误
Nov 17, 2014 9:30:30 AM org.apache.hadoop.conf.Configuration loadResource
SEVERE: error parsing conf file: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
Exception in thread "main" java.lang.RuntimeException: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1833)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1689)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1635)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:790)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:166)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:158)
at WriteToFile.main(WriteToFile.java:24)
Caused by: javax.xml.parsers.ParserConfigurationException: Feature 'http://apache.org/xml/features/xinclude' is not recognized.
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1720)
... 6 more
答案 0 :(得分:1)
当我将项目从2.5.1移动到2.6.0时,我的项目中出现了相同的异常。当我将xerces:*添加到着色的jar文件中时,我不得不使用maven pom文件来解决它。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>emc.lab.hadoop</groupId>
<artifactId>DartAnalytics</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>DartAnalytics</name>
<description>Examples for usage of Dart simulated data</description>
<properties>
<main.class>OffsetRTMain</main.class>
<hadoop.version>2.6.0</hadoop.version>
<minimize.jar>true</minimize.jar>
</properties>
<!-- <repositories> <repository> <id>mvn.twitter</id> <url>http://maven.twttr.com</url>
</repository> </repositories> -->
<build>
<plugins>
<plugin>
<!-- The shade plugin allows us to compile the dependencies into the
jar file -->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<!-- minimize the jar removes all files that are not addressed in the
file. but the filters include stuff we must include -->
<minimizeJar>${minimize.jar}</minimizeJar>
<filters>
<filter>
<artifact>com.hadoop.gplcompression:hadoop-lzo</artifact>
<includes>
<include>**</include>
</includes>
</filter>
<filter>
<!-- This solves the hadoop 2.6.0 problem with ClassNotFound of "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl" -->
<artifact>xerces:*</artifact>
<includes>
<include>**</include>
</includes>
</filter>
<filter>
<artifact>org.apache.hadoop:*</artifact>
<excludes>
<exclude>**</exclude>
</excludes>
</filter>
</filters>
<finalName>uber-${project.artifactId}-${project.version}</finalName>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>${main.class}</mainClass>
</transformer>
</transformers>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<!-- you can add this to the local repo by running mvn install:install-file
-Dfile=libs/hadoop-lzo-0.4.20-SNAPSHOT.jar -DgroupId=com.hadoop.gplcompression
-DartifactId=hadoop-lzo -Dversion=0.4.20 -Dpackaging=jar from the main project
directory -->
<!-- Another option is to build from outside the EMC network and get access
to the twitter maven repository by changing the version to a version in the
repository and un-commenting the repository addition -->
<dependency>
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.20</version>
</dependency>
<dependency>
<groupId>net.sf.trove4j</groupId>
<artifactId>trove4j</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>2.5.0</version>
</dependency>
<dependency>
<groupId>com.twitter.elephantbird</groupId>
<artifactId>elephant-bird-core</artifactId>
<version>4.5</version>
</dependency>
<!-- <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId>
<version>18.0</version> </dependency> -->
</dependencies>