Apache Spark,创建输出文件时出现问题

时间:2015-12-25 17:50:53

标签: apache-spark

我是Spark的新手,在创建单词计数程序时遇到了问题。 字数统计代码下方。

 scala> val input = sc.textFile("file:\\C:\\APJ.txt")
 scala> val words = input.flatMap(x => x.split(" "))
 scala> val result = words.map(x => (x,1)).reduceByKey((x,y) => x + y)
 scala> result.saveAsTextFile("file:\\D:\\output1")
下面的

是执行saveAsTextFile后的日志。文件夹在上述位置创建并具有文件part-00001。但是文件不包含数据。

       15/12/25 22:59:20 INFO SparkContext: Starting job: saveAsTextFile at <console>:2
8
15/12/25 22:59:20 INFO DAGScheduler: Got job 11 (saveAsTextFile at <console>:28)
 with 2 output partitions (allowLocal=false)
15/12/25 22:59:20 INFO DAGScheduler: Final stage: Stage 19(saveAsTextFile at <co
nsole>:28)
15/12/25 22:59:20 INFO DAGScheduler: Parents of final stage: List(Stage 18)
15/12/25 22:59:20 INFO DAGScheduler: Missing parents: List()
15/12/25 22:59:20 INFO DAGScheduler: Submitting Stage 19 (MapPartitionsRDD[24] a
t saveAsTextFile at <console>:28), which has no missing parents
15/12/25 22:59:20 INFO MemoryStore: ensureFreeSpace(127160) called with curMem=1
297015, maxMem=280248975
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_17 stored as values in memor
y (estimated size 124.2 KB, free 265.9 MB)
15/12/25 22:59:20 INFO BlockManager: Removing broadcast 14
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_14_piece0
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_14_piece0 of size 2653 dropp
ed from memory (free 278827453)
15/12/25 22:59:20 INFO MemoryStore: ensureFreeSpace(76221) called with curMem=14
21522, maxMem=280248975
15/12/25 22:59:20 INFO BlockManagerInfo: Removed broadcast_14_piece0 on localhos
t:50097 in memory (size: 2.6 KB, free: 267.0 MB)
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in
 memory (estimated size 74.4 KB, free 265.8 MB)
15/12/25 22:59:20 INFO BlockManagerMaster: Updated info of block broadcast_14_pi
ece0
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_14
15/12/25 22:59:20 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on
localhost:50097 (size: 74.4 KB, free: 266.9 MB)
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_14 of size 3736 dropped from
 memory (free 278754968)
15/12/25 22:59:20 INFO BlockManagerMaster: Updated info of block broadcast_17_pi
ece0
15/12/25 22:59:20 INFO ContextCleaner: Cleaned broadcast 14
15/12/25 22:59:20 INFO SparkContext: Created broadcast 17 from broadcast at DAGS
cheduler.scala:839
15/12/25 22:59:20 INFO BlockManager: Removing broadcast 15
15/12/25 22:59:20 INFO DAGScheduler: Submitting 2 missing tasks from Stage 19 (M
apPartitionsRDD[24] at saveAsTextFile at <console>:28)
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_15_piece0
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_15_piece0 of size 76238 drop
ped from memory (free 278831206)
15/12/25 22:59:20 INFO TaskSchedulerImpl: Adding task set 19.0 with 2 tasks
15/12/25 22:59:20 INFO BlockManagerInfo: Removed broadcast_15_piece0 on localhos
t:50097 in memory (size: 74.5 KB, free: 267.0 MB)
15/12/25 22:59:20 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 24,
localhost, PROCESS_LOCAL, 1056 bytes)
15/12/25 22:59:20 INFO BlockManagerMaster: Updated info of block broadcast_15_pi
ece0
15/12/25 22:59:20 INFO TaskSetManager: Starting task 1.0 in stage 19.0 (TID 25,
localhost, PROCESS_LOCAL, 1056 bytes)
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_15
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_15 of size 127160 dropped fr
om memory (free 278958366)
15/12/25 22:59:20 INFO Executor: Running task 0.0 in stage 19.0 (TID 24)
15/12/25 22:59:20 INFO Executor: Running task 1.0 in stage 19.0 (TID 25)
15/12/25 22:59:20 INFO ContextCleaner: Cleaned broadcast 15
15/12/25 22:59:20 INFO BlockManager: Removing broadcast 16
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_16_piece0
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_16_piece0 of size 76241 drop
ped from memory (free 279034607)
15/12/25 22:59:20 INFO BlockManagerInfo: Removed broadcast_16_piece0 on localhos
t:50097 in memory (size: 74.5 KB, free: 267.1 MB)
15/12/25 22:59:20 INFO BlockManagerMaster: Updated info of block broadcast_16_pi
ece0
15/12/25 22:59:20 INFO BlockManager: Removing block broadcast_16
15/12/25 22:59:20 INFO MemoryStore: Block broadcast_16 of size 127160 dropped fr
om memory (free 279161767)
15/12/25 22:59:20 INFO ContextCleaner: Cleaned broadcast 16
15/12/25 22:59:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks o
ut of 2 blocks
15/12/25 22:59:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks o
ut of 2 blocks
15/12/25 22:59:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in
0 ms
15/12/25 22:59:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in
0 ms
15/12/25 22:59:20 ERROR Executor: Exception in task 1.0 in stage 19.0 (TID 25)
java.lang.NullPointerException
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys
tem.java:656)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.
java:490)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:462)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:428)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputF
ormat.java:123)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1068)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1059)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:744)
15/12/25 22:59:20 WARN TaskSetManager: Lost task 1.0 in stage 19.0 (TID 25, loca
lhost): java.lang.NullPointerException
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys
tem.java:656)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.
java:490)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:462)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:428)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputF
ormat.java:123)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1068)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1059)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:744)

15/12/25 22:59:20 ERROR TaskSetManager: Task 1 in stage 19.0 failed 1 times; abo
rting job
15/12/25 22:59:20 INFO TaskSchedulerImpl: Cancelling stage 19
15/12/25 22:59:20 INFO Executor: Executor is trying to kill task 0.0 in stage 19
.0 (TID 24)
15/12/25 22:59:20 INFO TaskSchedulerImpl: Stage 19 was cancelled
15/12/25 22:59:21 INFO DAGScheduler: Stage 19 (saveAsTextFile at <console>:28) f
ailed in 0.125 s
15/12/25 22:59:21 INFO DAGScheduler: Job 11 failed: saveAsTextFile at <console>:
28, took 0.196747 s
15/12/25 22:59:21 ERROR Executor: Exception in task 0.0 in stage 19.0 (TID 24)
java.lang.NullPointerException
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys
tem.java:656)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.
java:490)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:462)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:428)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputF
ormat.java:123)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1068)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1059)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:744)
15/12/25 22:59:21 INFO TaskSetManager: Lost task 0.0 in stage 19.0 (TID 24) on e
xecutor localhost: java.lang.NullPointerException (null) [duplicate 1]
15/12/25 22:59:21 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have
 all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in sta
ge 19.0 failed 1 times, most recent failure: Lost task 1.0 in stage 19.0 (TID 25
, localhost): java.lang.NullPointerException
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
715)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys
tem.java:656)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.
java:490)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:462)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav
a:428)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputF
ormat.java:123)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1068)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFuncti
ons.scala:1059)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:744)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1193)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1192)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.
scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala
:1192)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:693)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu
ler.scala:693)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG
Scheduler.scala:1393)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAG
Scheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


scala>

0 个答案:

没有答案