I am new to spark and get this problem when i run my test program。I install spark on an linux server,and it has just one master node and one worker node。Then I write test program on my laptop,code like this:
`JavaSparkContext ct= new JavaSparkContext ("spark://192.168.90.74:7077","test","/home/webuser/spark/spark-1.5.2-bin-hadoop2.4",new String[0]);
ct.addJar("/home/webuser/java.spark.test-0.0.1-SNAPSHOT-jar-with-dependencies.jar");
List list=new ArrayList();
list.add(1);
list.add(6);
list.add(9);
JavaRDD<String> rdd=ct.parallelize(list);
System.out.println(rdd.collect());
rdd.saveAsTextFile("/home/webuser/temp");
ct.close();`
I suppose I could get /home/webuser/temp on my server ,but in fact this program create c://home/webuser/temp in my laptop which os is win8,I don't understand, shouldn't saveAsTextFile() run on spark's worker node?why it just run on my laptop,which is sprak's driver,I suppose.
答案 0 :(得分:0)
这取决于Spark安装的默认文件系统。根据您的说法,您的默认文件系统是file:///
,这是默认设置。要更改此设置,您需要修改Hadoop配置的fs.defaultFS
中的core-site.xml
属性。否则,您只需更改代码并在代码中指定文件系统URL,即:
rdd.saveAsTextFile("hdfs://192.168.90.74/home/webuser/temp");
如果192.168.90.74
是您的Namenode。