比较两种不同格式的哈希数组的最快最有效的方法

时间:2013-02-04 23:25:38

标签: arrays ruby-on-rails-3 comparison hash

我有两个哈希数组,格式为:

HASH1

[{:root => root_value, :child1 => child1_value, :subchild1 => subchild1_value, bases => hit1,hit2,hit3}...]

HASH2

[{:path => root_value/child1_value/subchild1_value, :hit1_exist => t ,hit2_exist => t,hit3_exist => f}...]

如果我这样做

Def sample
  results = nil
  project = Project.find(params[:project_id])
  testrun_query = "SELECT root_name, suite_name, case_name, ic_name, executed_platforms FROM testrun_caches WHERE start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} AND result <> 'SKIP' AND result <> 'N/A'"
  if !params[:platform].nil? && params[:platform] != [""]
    #yell_and_log "platform not nil"
    platform_query = nil
    params[:platform].each do |platform|
      if platform_query.nil?
        platform_query = " AND (executed_platforms LIKE '%#{platform.to_s},%'"
      else
        platform_query += " OR executed_platforms LIKE '%#{platform.to_s},%'"
      end
    end
    testrun_query += ")" + platform_query
  end
  if !params[:location].nil? &&!params[:location].empty?
    #yell_and_log "location not nil"
    testrun_query += "AND location LIKE '#{params[:location].to_s}%'"    
  end
  testrun_query += " GROUP BY root_name, suite_name, case_name, ic_name,   executed_platforms ORDER BY root_name, suite_name, case_name, ic_name"
  ic_query = "SELECT ics.path, memberships.pts8210, memberships.sv6, memberships.sv7,   memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name FROM ics INNER JOIN memberships on memberships.ic_id = ics.id INNER JOIN test_groups ON test_groups.id = memberships.test_group_id INNER JOIN projects ON test_groups.project_id = projects.id WHERE deleted = 'false' AND (memberships.pts8210 = true OR memberships.sv6 = true OR memberships.sv7 = true OR memberships.pts14k = true OR memberships.pts22k = true OR memberships.pts24k = true OR memberships.spb32 = true OR memberships.spb64 = true OR memberships.sde = true) AND projects.name = '#{project.name}' GROUP BY path, memberships.pts8210, memberships.sv6, memberships.sv7, memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name ORDER BY ics.path"
  if params[:ic_type] == "never_run"
    runtest = TestrunCache.connection.select_all(testrun_query)
    alltest = TrsIc.connection.select_all(ic_query) 
    (alltest.length).times do |i|
      #exec_pltfrm = test['executed_platforms'].split(",")
      unfinishedtest = comparison(runtest[i],alltest[i])
      yell_and_log("test = #{unfinishedtest}")
      yell_and_log("#{runtest[i]}")
      yell_and_log("#{alltest[i]}")
    end
  end
end

我进入我的日志:

test = true
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"cli",  "case_name"=>"functional", "ic_name"=>"cli_sanity_test", "executed_platforms"=>"pts22k,pts24k,sv7,"}
array of hash 2 = {"path"=>"BSDPLATFORM/cli/functional/cli_sanity_test", "pts8210"=>"f", "sv6"=>"f", "sv7"=>"t", "pts14k"=>nil, "pts22k"=>"t", "pts24k"=>"t", "spb32"=>nil, "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_packet_9", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/copyrights", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_1", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_1", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_2", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_files", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"f", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}

所以我只得到第一个匹配,但休息变得不同,我得到的结果是一个而不是4230

我希望通过路径和root / suite / case / ic进行匹配,然后比较哈希数组1中传递的执行平台与hash2数组中设置为true的平台

2 个答案:

答案 0 :(得分:1)

不确定这是否最快,我是根据原始问题编写的,但未提供示例代码,但是:

def compare(h1, h2)
  (h2[:path] == "#{h1[:root]}/#{h1[:child1]}/#{h1[:subchild1]}") && \
  (h2[:hit1_exist] == ((h1[:bases][0] == nil) ? 'f' : 't')) && \
  (h2[:hit2_exist] == ((h1[:bases][1] == nil) ? 'f' : 't')) && \
  (h2[:hit3_exist] == ((h1[:bases][2] == nil) ? 'f' : 't'))
end

def compare_arr(h1a, h2a)
  (h1a.length).times do |i|
    compare(h1a[i],h2a[i])
  end
end

测试:

require "benchmark"

h1a = []
h2a = []

def rstr
  # from http://stackoverflow.com/a/88341/178651
  (0...2).map{65.+(rand(26)).chr}.join
end

def rnil
  rand(2) > 0 ? '' : nil
end

10000.times do
  h1a << {:root => rstr(), :child1 => rstr(), :subchild1 => rstr(), :bases => [rnil,rnil,rnil]}
  h2a << {:path => '#{rstr()}/#{rstr()}/#{rstr()}', :hit1_exist => 't', :hit2_exist => 't', :hit3_exist => 'f'}
end

Benchmark.measure do
  compare_arr(h1a,h2a)
end

结果:

=>   0.020000   0.000000   0.020000 (  0.024039)

现在我正在查看你的代码,我认为它可以通过删除数组创建来优化,并且分割和连接正在创建需要进行垃圾收集的数组和字符串,这也会减慢速度,但不会就像你提到的一样。

您的数据库查询可能很慢。对它们运行解释/分析或类似,看看为什么每个都很慢,优化/减少你的查询,在需要的地方添加索引等。另外,检查cpu和内存利用率等。它可能不仅仅是代码。

但是,有一些确定的事情需要修复。您还有几种SQL注入攻击的风险,例如:

... start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} ...

将params和变量直接放入SQL的任何地方都可能存在危险。您需要确保使用预准备语句或至少SQL转义值。请完整阅读:http://guides.rubyonrails.org/active_record_querying.html

答案 1 :(得分:0)

([element_being_tested].each do |el|
  [hash_array_1, hash_array_2].reject do |x, y|
    x[el] == y[el]
  end
end).each {|x, y| puts (x[bases] | y[bases])}

枚举要测试的哈希元素。     [element_being_tested] .each do | el |

然后遍历散列数组本身,将给定的散列与外部循环定义的给定比较的元素进行比较,拒绝那些不恰当相等的散列。 (==实际上可能需要!=,但你可以想出那么多)

  [hash_array_1, hash_array_2].reject do |x, y|
    x[el] == y[el]
  end

最后,你再次比较哈希采用其元素的集合。

.each {|x, y| puts (x[bases] | y[bases])}

您可能需要测试代码。它不像演示那么多,因为我不确定我是否正确阅读了你的代码。如果答案不满意,请发布更大的来源样本,包括有问题的数据结构。

关于速度:如果你在迭代一个大型数据集并比较多个数据集,那么你可能无能为力。也许你可以反转我提出的循环并使散列数组成为外循环。如果数据结构很大,你不会在Ruby(实际上是任何语言)中获得闪电般的速度。