我尝试使用apache spark在HDFS中保存csv文件时出错

时间:2014-10-14 11:05:10

标签: apache-spark

我只是想编写一个需要在HDFS中保存csv文件的程序,代码在eclipse中运行正常,但是当我尝试在eclipse之外执行jar时,它会给我一个错误:

2014-10-14 12:41:31 INFO  SecurityManager:58 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aroman)
    Exception in thread "main" java.lang.ExceptionInInitializerError
    at com.tekcomms.c2d.utils.MyWatchService.saveIntoHdfs(MyWatchService.java:362)
    at com.tekcomms.c2d.utils.MyWatchService.processDataCastFile(MyWatchService.java:332)
    at com.tekcomms.c2d.utils.MyWatchService.processCreateEvent(MyWatchService.java:224)
    at com.tekcomms.c2d.utils.MyWatchService.watch(MyWatchService.java:180)
    at com.tekcomms.c2d.main.FeedAdaptor.main(FeedAdaptor.java:40)
    Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
    at com.tekcomms.c2d.utils.MySparkUtils.<clinit>(MySparkUtils.java:29)
    ... 5 more

这是负责在HDFS中写作的部分:

public class MySparkUtils {

final static Logger LOGGER = Logger.getLogger(MySparkUtils.class);

private static JavaSparkContext sc;

static {
    SparkConf conf = new SparkConf().setAppName("MySparkUtils");
    String master = MyWatchService.getSPARK_MASTER();
    conf.setMaster(master );
    //this is horrible! how can i pass of it?
    String [] jars = {"target/feed-adapter-0.0.1-SNAPSHOT.jar"};
    conf.setJars(jars );
    sc = new JavaSparkContext(conf);
    LOGGER.debug("spark context initialized!");
}

public static boolean saveWithinHDFS(String path,StringBuffer sb){
    LOGGER.debug("Trying to save in HDFS. path: " + path);
    boolean isOk=false;

    String [] aStrings = sb.toString().split("\n");
    List<String> jsonDatab = Arrays.asList(aStrings);

    JavaRDD<String> dataRDD = sc.parallelize(jsonDatab);
    dataRDD.saveAsTextFile(path);
    return isOk;
}

}

这是我的pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.tekcomms.c2d</groupId>
<artifactId>feed-adapter</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>feed-adaptor</name>
<description>a poc about to scan every second a remote filesystem seeking new csv files from datacast, load the csv file into memory, scan every line of csv matching with a set of pattern rules (matching_phone, matching_mac) if found a match, i will create a string buffer with that previous info, if there is no match, i will create another string buffer with that discarded data. Finally i have to copy those files into HDFS.   </description>
<developers>
    <developer>
        <name>Alonso Isidoro Román</name>
        <email>XXX</email>
        <timezone>+1 Madrid</timezone>
        <organization>XXXX</organization>
        <url>about.me/alonso.isidoro.roman</url>
    </developer>
</developers>

<dependencies>
    <!-- StringUtils... -->
    <dependency>
        <groupId>commons-lang</groupId>
        <artifactId>commons-lang</artifactId>
        <version>2.6</version>
    </dependency>

    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
    <dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.0.0</version>
        <scope>compile</scope>
        <optional>false</optional>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>Akka repository</id>
        <url>http://repo.akka.io/releases</url>
    </repository>

    <!-- >repository> <id>cloudera-repos</id> <name>Cloudera Repos</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> 
        </repository -->

    <!-- repository> <id>CLOUDERA</id> <url>https://repository.cloudera.com/artifactory/repo/org/apache/spark/spark-core_2.10/0.9.0-cdh5.0.0-beta-2/</url> 
        </repository> <repository> <id>cdh.repo</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> 
        <name>Cloudera Repositories</name> <snapshots> <enabled>false</enabled> </snapshots> 
        </repository> <repository> <id>cdh.snapshots.repo</id> <url>https://repository.cloudera.com/artifactory/libs-snapshot-local</url> 
        <name>Cloudera Snapshots Repository</name> <snapshots> <enabled>true</enabled> 
        </snapshots> <releases> <enabled>false</enabled> </releases> </repository> 
        <repository> <id>central</id> <url>http://repo1.maven.org/maven2/</url> <releases> 
        <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> 
        </snapshots> </repository -->

    <repository>
        <id>cloudera-repos</id>
        <name>Cloudera Repos</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

</repositories>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer
                                implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.tekcomms.c2d.main.FeedAdaptor</mainClass>
                            </transformer>
                        </transformers>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

我做错了什么?

修改

最后问题是找出你的hdfs群集的确切jar,错误的版本!,另一个问题是hdfs方面的一个非常严格的umask,我的本地用户因为permisions而无法在HDFS中写入!

1 个答案:

答案 0 :(得分:0)

最后问题是找出你的hdfs群集的确切jar,错误的版本!,另一个问题是hdfs方面的一个非常严格的umask,我的本地用户因为permisions而无法在HDFS中写入!