如何使用ApacheSpark读取数据,目录路径?

时间:2017-03-17 19:01:40

标签: apache-spark

这是我的spark-shell

的pwd
/home/milenko/spark-2.0.1-bin-hadoop2.7/bin

这是我的数据所在的文件夹的pwd

/home/milenko/dom1/wikipedia/src/main/scala/wikipedia

如果我尝试使用我的spark-shell

scala> val wikiRdd = sc.parallelize(/home/milenko/dom1/wikipedia/src/main/scala/wikipedia/WikipediaARticle)
<console>:25: error: not found: value /
Error occurred in an application involving default arguments.
       val wikiRdd = sc.parallelize(/home/milenko/dom1/wikipedia/src/main/scala/wikipedia/WikipediaARticle)
                                    ^
<console>:25: error: not found: value /
Error occurred in an application involving default arguments.
       val wikiRdd = sc.parallelize(/home/milenko/dom1/wikipedia/src/main/scala/wikipedia/WikipediaARticle)

如何设置正确的路径?

1 个答案:

答案 0 :(得分:1)

尝试这样的东西来读取文件。用双引号括起路径“

val input = sc.textFile("/tmp/filename")