Question

我在Mechanize Ruby脚本中泄漏内存时遇到了一些问题。

我“循环”多个网页永远访问，每个循环的内存增加很多。在几分钟后创建了“未能分配内存”并使脚本退出。

事实上，即使我将结果分配给相同的“局部变量”甚至是“全局变量”，agent.get方法似乎也会实例化并保存结果。所以我尝试在上次使用之后和重用相同名称变量之前将nil分配给变量。但似乎以前的agent.get结果仍在内存中，并且真的不知道如何耗尽RAM以使我的脚本在下班后使用大致稳定的内存量？

以下是两个代码的和平：（保持“输入”键并看到Ruby分配的RAM增长）

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
    page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    page = agent.get 'http://www.nypost.com/'
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    puts local_variables
    GC.start
    puts local_variables
    #puts GC.malloc_allocations
end

而使用全局变量：

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
    $page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "$page.object_id  : "+$page.object_id.to_s
    $page = agent.get 'http://www.nypost.com/'
    puts "$page.object_id  : "+$page.object_id.to_s
    #puts local_variables
    #puts global_variables
end

在其他语言中，变量会重新受影响，并且分配的内存保持稳定。红宝石为什么不？我如何强制实例垃圾？

修改： 这是使用Object的另一个例子，因为Ruby是面向对象的语言，但结果完全相同：内存一次又一次地增长......

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            remove_instance_variable(:@page)
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

我的回答（没有足够的声誉来正确地做到这一点）

好的！

似乎Mechanize::History.clear极大地解决了内存泄漏问题。

如果你想在之前和之后进行测试，这里是最后修改的Ruby代码......

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

Answer 1

我的建议是设置agent.max_history = 0.如链接问题列表中所述。

这将保留历史记录条目，甚至不会使用#clear。

以下是其他答案的修改版本

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
    def initialize url
        while true
            @page = $agent.get url
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

Answer 2

好的！ （有足够的声誉来正确回答我的问题）

似乎Mechanize::History.clear极大地解决了内存泄漏问题。

如果你想在之前和之后进行测试，这里是最后修改的Ruby代码......

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

Ruby / Mechanize“无法分配内存”。擦除'agent.get'方法的实例化？

2 个答案: