我使用Python Theano Library编写了一个二元分类器。基于不同的数据文件,我想基于Amazon Web Service(AWS)EC2将分类器与一个主节点和几个使用Apache-Spark的从节点并行化。当我在AWS-EC2主节点(t2.micro类型(1 GB RAM))上使用“本地”模式测试我的代码时,我在内存上遇到以下错误:
15/08/09 03:57:22 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/08/09 03:57:22 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/09 03:57:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/08/09 03:57:22 INFO DAGScheduler: ResultStage 0 (foreach at /root/IdeaNets/Spark/spark_test.py:110) failed in 7.253 s
15/08/09 03:57:22 INFO DAGScheduler: Job 0 failed: foreach at /root/IdeaNets/Spark/spark_test.py:110, took 7.452321 s
Traceback (most recent call last):
File "/root/IdeaNets/Spark/spark_test.py", line 113, in <module>
main()
File "/root/IdeaNets/Spark/spark_test.py", line 110, in main
datafile.foreach(lambda (path, content): lstm_test(path, content))
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 721, in foreach
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 972, in count
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 963, in sum
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in reduce
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 745, in collect
File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 304, in func
File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 719, in processPartition
File "/root/IdeaNets/Spark/spark_test.py", line 110, in <lambda>
datafile.foreach(lambda (path, content): lstm_test(path, content))
File "/root/IdeaNets/Spark/spark_test.py", line 71, in lstm_test
run_lstm.build_model()
File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 328, in build_model
(use_noise, x, mask, y, self.f_pred_prob, self.f_pred, cost) = self._build_model(self._tparams, self.model_options)
File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 294, in _build_model
proj = self.dropout_layer(proj, use_noise, trng)
File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 153, in dropout_layer
dtype=state_before.dtype)),
File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1241, in binomial
x = self.uniform(size=size, nstreams=nstreams)
File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1219, in uniform
node_rstate = shared(self.get_substream_rstates(nstreams))
File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1109, in get_substream_rstates
multMatVect(rval[0], A1p72, M1, A2p72, M2)
File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 55, in multMatVect
[A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function.py", line 266, in function
profile=profile)
File "/usr/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 511, in pfunc
on_unused_input=on_unused_input)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1466, in orig_function
defaults)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1324, in create
input_storage=input_storage_lists)
File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 519, in make_thunk
output_storage=output_storage)[:3]
File "/usr/local/lib/python2.7/site-packages/theano/gof/vm.py", line 897, in make_all
no_recycling))
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 739, in make_thunk
output_storage=node_output_storage)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1073, in make_thunk
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1015, in __compile__
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1434, in cthunk_factory
key = self.cmodule_key()
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1154, in cmodule_key
compile_args=self.compile_args(),
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 872, in compile_args
ret += c_compiler.compile_args()
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1703, in compile_args
native_lines = get_lines("%s -march=native -E -v -" % theano.config.cxx)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1672, in get_lines
shell=True)
File "/usr/local/lib/python2.7/site-packages/theano/misc/windows.py", line 36, in subprocess_Popen
proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1231, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/08/09 03:57:23 INFO SparkContext: Invoking stop() from shutdown hook
15/08/09 03:57:23 INFO SparkUI: Stopped Spark web UI at http://ec2-54-86-238-21.compute-1.amazonaws.com:4040
15/08/09 03:57:23 INFO DAGScheduler: Stopping DAGScheduler
15/08/09 03:57:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/09 03:57:23 INFO Utils: path = /mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/blockmgr-dc30d0fa-addc-422f-8129-2603af492279, already present as root for deletion.
15/08/09 03:57:23 INFO MemoryStore: MemoryStore cleared
15/08/09 03:57:23 INFO BlockManager: BlockManager stopped
15/08/09 03:57:23 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/09 03:57:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/09 03:57:23 INFO SparkContext: Successfully stopped SparkContext
15/08/09 03:57:23 INFO Utils: Shutdown hook called
15/08/09 03:57:23 INFO Utils: Deleting directory /mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b
我确信我的所有代码都没有错误,似乎问题是内存不足或与内存相关的其他问题。但是AWS上的t2.micro有1 GB的内存,代码读取的用于训练分类器的数据大约只有1.7 KB,所以我认为内存足够大。我真的不知道错误以及如何解决它,如果有人帮助我,我真的很感激。
答案 0 :(得分:0)
也许你可以使用subprocess.Open(“cmd”,stdout = subprocess.PIPE,shell = True,close_fds = True)