在红宝石中生成热图的算法

时间:2017-10-01 21:09:05

标签: ruby algorithm

我想为售票系统建立一个热图(像this这样的表格)。我正在以JSON格式从db数据接收所有票证详细信息。以下是一个例子。实际数据有1000多条记录。

{"ticketCount": 6, 
 "tickets": 
  [
    {"creationTimeMs": 1506061704724, 
     "expirationTimeMs": 1506083304724, 
     "queue": "low"}, 
    {"creationTimeMs": 1506127874782, 
     "expirationTimeMs": 1506149474782, 
     "queue": "low"}, 
    {"creationTimeMs": 1506283760321, 
     "expirationTimeMs": 1506283760322, 
     "queue": "high"}, 
    {"creationTimeMs": 1506236363281, 
     "expirationTimeMs": 1506257963281,  
     "queue": "high"}, 
    {"creationTimeMs": 1506283655948, 
     "expirationTimeMs": 1506283667938,  
     "queue": "low"}, 
    {"creationTimeMs": 1506283781894, 
     "expirationTimeMs": 1506284781894,  
     "queue": "medium"}
  ]
}   

我想要一个包含队列名称(不固定)的表作为行和剩余时间(currentTime - expirationTime)作为列。我希望有5列在<10分钟,10-30分钟,30-1小时,1-5小时,> 5小时内到期。

我知道如何通过一次又一次地循环json来做暴力。我想知道我们是否有一些最好的算法以及ruby可以提供的简单算法。

2 个答案:

答案 0 :(得分:1)

<强>代码

require 'json'   

def cross_tab(json, range_mins)
  JSON.parse(json)["tickets"].each_with_object(Hash.new(0)) do |g,h|
    diff = g["etime"]-g["ctime"]
    h[[g["queue"], range_mins.rindex { |mn| mn <= diff }]] += 1
  end
end

示例

json = '{"ticketCount": 6, 
  "tickets": [
    {"ctime": 1506061704724, "etime": 1506083304724, "queue": "low"}, 
    {"ctime": 1506127874782, "etime": 1506149474782, "queue": "low"},
    {"ctime": 1506283760321, "etime": 1506283760322, "queue": "high"}, 
    {"ctime": 1506236363281, "etime": 1506257963281, "queue": "high"}, 
    {"ctime": 1506283655948, "etime": 1506283667938, "queue": "low"}, 
    {"ctime": 1506283781894, "etime": 1506284781894, "queue": "medium"}
  ]
}'

range_mins = [0, 10, 30, 60, 300].map { |n| 60000 * n }
  #=> [0, 600_000, 1_800_000, 3_600_000, 18_000_000]

h = cross_tab(json, range_mins)
  #=> {["low", 4]=>2, ["high", 0]=>1, ["high", 4]=>1, ["low", 0]=>1, ["medium", 1]=>1}

h[["high", 4]]
  #=> 1
h[["low", 3]]
  #=> 0

获得第二个结果,因为h的默认值为0且没有键["low", 3]

我们现在可以构建交叉表(或交叉制表列联表)的内容,如下所示。

row_map = { 0=>"low", 1=>"medium", 2=>"high" }

tbl = Array.new(row_map.size) { |i|
        Array.new(range_mins.size) { |j| h[[row_map[i], j]] } }
  #=> [[1, 0, 0, 0, 2],
  #    [0, 1, 0, 0, 0],
  #    [1, 0, 0, 0, 1]]

行(列)标签是从row_maprange_mins

获得的

我们也可以从row_map计算json

JSON.parse(json)["tickets"].map { |h| h["queue"] }.uniq.
  map.with_index { |queue, i| [i, queue] }.to_h
    #=> {0=>"low", 1=>"high", 2=>"medium"}

但这不允许我们指定表行的顺序或生成只包含"queue"的某些值的表。

<强>解释

该方法使用类方法Hash::new的形式,该方法接受一个参数(此处为0),该参数是哈希的默认值。这仅表示如果h = Hash.new(0)h没有密钥k,则h[k]会返回默认值。 (散列不会改变。)

以这种方式定义的哈希有时被称为计数哈希,常用(并在此处使用)与计算h[k] +=1。当Ruby看到这一点时,她所做的第一件事就是将其扩展为

h[k] = h[k] + 1

如果h没有密钥k,则相等右侧的h[k](方法Hash#[])将转换为默认值{{1 }}。随后每次对同一个键0执行此表达式时,右侧的k将返回h[k]的当前值(即,默认值不适用)。 (注意,等式左边的k是方法Hash#[]=,它与默认值无关。)

以下步骤。

h[k]

第一个元素元素由枚举器生成,传递给块,块变量设置为等于该值,并执行块计算。

h = JSON.parse(json)
  #=> {"ticketCount"=>6,
  #    "tickets"=>[
  #      {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
  #      {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
  #      {"ctime"=>1506283760321, "etime"=>1506283760322, "queue"=>"high"},
  #      {"ctime"=>1506236363281, "etime"=>1506257963281, "queue"=>"high"},
  #      {"ctime"=>1506283655948, "etime"=>1506283667938, "queue"=>"low"},
  #      {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}
  #    ]
  #   }
a = h["tickets"]
  #=> [{"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
  #    {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
  #    {"ctime"=>1506283760321, "etime"=>1506283760322, "queue"=>"high"},
  #    {"ctime"=>1506236363281, "etime"=>1506257963281, "queue"=>"high"},
  #    {"ctime"=>1506283655948, "etime"=>1506283667938, "queue"=>"low"},
  #    {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}]
e = a.each_with_object(Hash.new(0))
  #=> #<Enumerator: [
  #     {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
  #     {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
  #     ...
  #     {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}
  #   ]:each_with_object({})>

这表明g, h = e.next # => [{"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"}, {}] g #=> {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"} h #=> {} f = g["queue"] #=> "low" diff = g["etime"]-g["ctime"] #=> 1506083304724 - 1506061704724 => 21600000 j = range_mins.rindex { |mn| mn <= diff } #=> 4 range_mins[4] #=> 18_000_000的最大值,小于或等于range_minsdiff)`。接着,

21_600_000

然后,枚举器k = [f, j] #=> ["low", 4] h[k] += 1 #=> 1 h #=> {["low", 4]=>1} 将下一个值传递给块。

e

其余步骤类似。

答案 1 :(得分:0)

group_by的连续应用可以做到这一点。像

这样的东西
data['tickets'].group_by { |ticket| ticket['queue'] }.transform_values do |tickets|
  tickets.group_by |ticket|
    # categorize ticket by time until expiry
  end
end

这导致嵌套哈希,其中第一级键是队列名称,第二级键是您为到期时间选择的任何类别。