spark saveAsTextFile method is really strange in java api,it just not work right in my program

时间:2015-11-25 07:37:24

标签: apache-spark rdd

I am new to spark and get this problem when i run my test program。I install spark on an linux server,and it has just one master node and one worker node。Then I write test program on my laptop,code like this:

     `JavaSparkContext ct= new JavaSparkContext ("spark://192.168.90.74:7077","test","/home/webuser/spark/spark-1.5.2-bin-hadoop2.4",new String[0]);
    ct.addJar("/home/webuser/java.spark.test-0.0.1-SNAPSHOT-jar-with-dependencies.jar");
    List list=new ArrayList();
    list.add(1);
    list.add(6);
    list.add(9);
    JavaRDD<String> rdd=ct.parallelize(list);
    System.out.println(rdd.collect());
    rdd.saveAsTextFile("/home/webuser/temp");
    ct.close();`

I suppose I could get /home/webuser/temp on my server ,but in fact this program create c://home/webuser/temp in my laptop which os is win8,I don't understand, shouldn't saveAsTextFile() run on spark's worker node?why it just run on my laptop,which is sprak's driver,I suppose.

1 个答案:

答案 0 :(得分:0)

这取决于Spark安装的默认文件系统。根据您的说法,您的默认文件系统是file:///,这是默认设置。要更改此设置,您需要修改Hadoop配置的fs.defaultFS中的core-site.xml属性。否则,您只需更改代码并在代码中指定文件系统URL,即:

rdd.saveAsTextFile("hdfs://192.168.90.74/home/webuser/temp");

如果192.168.90.74是您的Namenode。