PySpark OSError:[Errno 12]无法分配内存

时间:2015-08-09 04:15:36

标签: python memory amazon-web-services apache-spark theano

我使用Python Theano Library编写了一个二元分类器。基于不同的数据文件,我想基于Amazon Web Service(AWS)EC2将分类器与一个主节点和几个使用Apache-Spark的从节点并行化。当我在AWS-EC2主节点(t2.micro类型(1 GB RAM))上使用“本地”模式测试我的代码时,我在内存上遇到以下错误:

15/08/09 03:57:22 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/08/09 03:57:22 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/08/09 03:57:22 INFO TaskSchedulerImpl: Cancelling stage 0
15/08/09 03:57:22 INFO DAGScheduler: ResultStage 0 (foreach at /root/IdeaNets/Spark/spark_test.py:110) failed in 7.253 s
15/08/09 03:57:22 INFO DAGScheduler: Job 0 failed: foreach at /root/IdeaNets/Spark/spark_test.py:110, took 7.452321 s
Traceback (most recent call last):
  File "/root/IdeaNets/Spark/spark_test.py", line 113, in <module>
    main()
  File "/root/IdeaNets/Spark/spark_test.py", line 110, in main
    datafile.foreach(lambda (path, content): lstm_test(path, content))
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 721, in foreach
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 972, in count
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 963, in sum
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in reduce
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 745, in collect
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
    process()
  File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2318, in pipeline_func
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 304, in func
  File "/root/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 719, in processPartition
  File "/root/IdeaNets/Spark/spark_test.py", line 110, in <lambda>
    datafile.foreach(lambda (path, content): lstm_test(path, content))
  File "/root/IdeaNets/Spark/spark_test.py", line 71, in lstm_test
    run_lstm.build_model()
  File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 328, in build_model
    (use_noise, x, mask, y, self.f_pred_prob, self.f_pred, cost) = self._build_model(self._tparams, self.model_options)
  File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 294, in _build_model
    proj = self.dropout_layer(proj, use_noise, trng)
  File "/mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/userFiles-57a9bb6c-2d93-48fe-a9a4-72d587d70a28/lstm_class.py", line 153, in dropout_layer
    dtype=state_before.dtype)),
  File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1241, in binomial
    x = self.uniform(size=size, nstreams=nstreams)
  File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1219, in uniform
    node_rstate = shared(self.get_substream_rstates(nstreams))
  File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1109, in get_substream_rstates
    multMatVect(rval[0], A1p72, M1, A2p72, M2)
  File "/usr/local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 55, in multMatVect
    [A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function.py", line 266, in function
    profile=profile)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 511, in pfunc
    on_unused_input=on_unused_input)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1466, in orig_function
    defaults)
  File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1324, in create
    input_storage=input_storage_lists)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 519, in make_thunk
    output_storage=output_storage)[:3]
  File "/usr/local/lib/python2.7/site-packages/theano/gof/vm.py", line 897, in make_all
    no_recycling))
  File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 739, in make_thunk
    output_storage=node_output_storage)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1073, in make_thunk
    keep_lock=keep_lock)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1015, in __compile__
    keep_lock=keep_lock)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1434, in cthunk_factory
    key = self.cmodule_key()
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1154, in cmodule_key
    compile_args=self.compile_args(),
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 872, in compile_args
    ret += c_compiler.compile_args()
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1703, in compile_args
    native_lines = get_lines("%s -march=native -E -v -" % theano.config.cxx)
  File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1672, in get_lines
    shell=True)
  File "/usr/local/lib/python2.7/site-packages/theano/misc/windows.py", line 36, in subprocess_Popen
    proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
  File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1231, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

    at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
    at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

15/08/09 03:57:23 INFO SparkContext: Invoking stop() from shutdown hook
15/08/09 03:57:23 INFO SparkUI: Stopped Spark web UI at http://ec2-54-86-238-21.compute-1.amazonaws.com:4040
15/08/09 03:57:23 INFO DAGScheduler: Stopping DAGScheduler
15/08/09 03:57:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/09 03:57:23 INFO Utils: path = /mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b/blockmgr-dc30d0fa-addc-422f-8129-2603af492279, already present as root for deletion.
15/08/09 03:57:23 INFO MemoryStore: MemoryStore cleared
15/08/09 03:57:23 INFO BlockManager: BlockManager stopped
15/08/09 03:57:23 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/09 03:57:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/09 03:57:23 INFO SparkContext: Successfully stopped SparkContext
15/08/09 03:57:23 INFO Utils: Shutdown hook called
15/08/09 03:57:23 INFO Utils: Deleting directory /mnt/spark/spark-0b0cd075-92cb-4469-bc85-c347ae6cd58b

我确信我的所有代码都没有错误,似乎问题是内存不足或与内存相关的其他问题。但是AWS上的t2.micro有1 GB的内存,代码读取的用于训练分类器的数据大约只有1.7 KB,所以我认为内存足够大。我真的不知道错误以及如何解决它,如果有人帮助我,我真的很感激。

1 个答案:

答案 0 :(得分:0)

也许你可以使用subprocess.Open(“cmd”,stdout = subprocess.PIPE,shell = True,close_fds = True)