我怎样才能为Python程序分配更多内存?它在4GB RAM上的消耗不超过64MB

时间:2012-12-24 09:52:07

标签: python ubuntu memory-management mapreduce mrjob

我有一个Python程序在4GB RAM 32位12.04 Ubuntu的某些输入数据上运行。程序的时间和空间复杂度都是O(n)。当输入数据大约为100 kb时,它在大约4秒内完成执行,峰值RAM消耗为0.5%(在LINUX中使用“top”命令)。但是,当我尝试大小为500kB,2.5MB和16 MB的输入数据时,该过程在1小时内没有完成(在每种情况下,我不得不使用Cntrl C取消)并且内存消耗被卡在1.6%(即每种情况下大约64MB)。我能以某种方式为这个Python进程分配更多的RAM内存吗?

注意:我正在使用Python制作的'mrjob'库在Python中实现Map Reduce作业。

以下是输入csv文件为100 kB时成功执行的日志。

   ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py as.txt > asop.txtusing configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
Moving /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000 -> /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output/part-00000
Streaming final output from /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output
removing tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269

这是输入csv文件为2.5 MB时的执行日志和回溯。

ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py matlabsample.csv > matsamop.txt
using configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-1-mapper_part-00000
^CTraceback (most recent call last):


  File "mt1.py", line 311, in <module>
    Motion_Tagging.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 545, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 561, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 631, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/runner.py", line 490, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 193, in _run
    combiner_args=combiner_args)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 488, in _invoke_step
    self._wait_for_process(proc_dict, step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 657, in _wait_for_process
    tb_lines = find_python_traceback(stderr_lines)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/parse.py", line 171, in find_python_traceback
    for line in lines:
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 680, in _process_stderr_from_script
    for line in stderr:
KeyboardInterrupt

2 个答案:

答案 0 :(得分:1)

你没有“为Python进程分配内存”,你在Python程序中使用了更大的结构。从根本上讲,你的算法可能存在缺陷,因为它没有利用可用的内存。

答案 1 :(得分:0)

仅供参考,这不是代码级解决方案。 但是你可以通过下面的链接深入思考python内存实现是如何工作的,以及问题是如何解决的。它还讨论了可以改进Python内存管理的其他方面。希望它会有用。

http://www.evanjones.ca/memoryallocator/