我想为售票系统建立一个热图(像this这样的表格)。我正在以JSON格式从db数据接收所有票证详细信息。以下是一个例子。实际数据有1000多条记录。
{"ticketCount": 6,
"tickets":
[
{"creationTimeMs": 1506061704724,
"expirationTimeMs": 1506083304724,
"queue": "low"},
{"creationTimeMs": 1506127874782,
"expirationTimeMs": 1506149474782,
"queue": "low"},
{"creationTimeMs": 1506283760321,
"expirationTimeMs": 1506283760322,
"queue": "high"},
{"creationTimeMs": 1506236363281,
"expirationTimeMs": 1506257963281,
"queue": "high"},
{"creationTimeMs": 1506283655948,
"expirationTimeMs": 1506283667938,
"queue": "low"},
{"creationTimeMs": 1506283781894,
"expirationTimeMs": 1506284781894,
"queue": "medium"}
]
}
我想要一个包含队列名称(不固定)的表作为行和剩余时间(currentTime - expirationTime
)作为列。我希望有5列在<10分钟,10-30分钟,30-1小时,1-5小时,> 5小时内到期。
我知道如何通过一次又一次地循环json来做暴力。我想知道我们是否有一些最好的算法以及ruby可以提供的简单算法。
答案 0 :(得分:1)
<强>代码强>
require 'json'
def cross_tab(json, range_mins)
JSON.parse(json)["tickets"].each_with_object(Hash.new(0)) do |g,h|
diff = g["etime"]-g["ctime"]
h[[g["queue"], range_mins.rindex { |mn| mn <= diff }]] += 1
end
end
示例强>
json = '{"ticketCount": 6,
"tickets": [
{"ctime": 1506061704724, "etime": 1506083304724, "queue": "low"},
{"ctime": 1506127874782, "etime": 1506149474782, "queue": "low"},
{"ctime": 1506283760321, "etime": 1506283760322, "queue": "high"},
{"ctime": 1506236363281, "etime": 1506257963281, "queue": "high"},
{"ctime": 1506283655948, "etime": 1506283667938, "queue": "low"},
{"ctime": 1506283781894, "etime": 1506284781894, "queue": "medium"}
]
}'
range_mins = [0, 10, 30, 60, 300].map { |n| 60000 * n }
#=> [0, 600_000, 1_800_000, 3_600_000, 18_000_000]
h = cross_tab(json, range_mins)
#=> {["low", 4]=>2, ["high", 0]=>1, ["high", 4]=>1, ["low", 0]=>1, ["medium", 1]=>1}
h[["high", 4]]
#=> 1
h[["low", 3]]
#=> 0
获得第二个结果,因为h
的默认值为0
且没有键["low", 3]
。
我们现在可以构建交叉表(或交叉制表或列联表)的内容,如下所示。
row_map = { 0=>"low", 1=>"medium", 2=>"high" }
tbl = Array.new(row_map.size) { |i|
Array.new(range_mins.size) { |j| h[[row_map[i], j]] } }
#=> [[1, 0, 0, 0, 2],
# [0, 1, 0, 0, 0],
# [1, 0, 0, 0, 1]]
行(列)标签是从row_map
(range_mins
)
我们也可以从row_map
计算json
。
JSON.parse(json)["tickets"].map { |h| h["queue"] }.uniq.
map.with_index { |queue, i| [i, queue] }.to_h
#=> {0=>"low", 1=>"high", 2=>"medium"}
但这不允许我们指定表行的顺序或生成只包含"queue"
的某些值的表。
<强>解释强>
该方法使用类方法Hash::new的形式,该方法接受一个参数(此处为0
),该参数是哈希的默认值。这仅表示如果h = Hash.new(0)
和h
没有密钥k
,则h[k]
会返回默认值。 (散列不会改变。)
以这种方式定义的哈希有时被称为计数哈希,常用(并在此处使用)与计算h[k] +=1
。当Ruby看到这一点时,她所做的第一件事就是将其扩展为
h[k] = h[k] + 1
如果h
没有密钥k
,则相等右侧的h[k]
(方法Hash#[])将转换为默认值{{1 }}。随后每次对同一个键0
执行此表达式时,右侧的k
将返回h[k]
的当前值(即,默认值不适用)。 (注意,等式左边的k
是方法Hash#[]=,它与默认值无关。)
以下步骤。
h[k]
第一个元素元素由枚举器生成,传递给块,块变量设置为等于该值,并执行块计算。
h = JSON.parse(json)
#=> {"ticketCount"=>6,
# "tickets"=>[
# {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
# {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
# {"ctime"=>1506283760321, "etime"=>1506283760322, "queue"=>"high"},
# {"ctime"=>1506236363281, "etime"=>1506257963281, "queue"=>"high"},
# {"ctime"=>1506283655948, "etime"=>1506283667938, "queue"=>"low"},
# {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}
# ]
# }
a = h["tickets"]
#=> [{"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
# {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
# {"ctime"=>1506283760321, "etime"=>1506283760322, "queue"=>"high"},
# {"ctime"=>1506236363281, "etime"=>1506257963281, "queue"=>"high"},
# {"ctime"=>1506283655948, "etime"=>1506283667938, "queue"=>"low"},
# {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}]
e = a.each_with_object(Hash.new(0))
#=> #<Enumerator: [
# {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"},
# {"ctime"=>1506127874782, "etime"=>1506149474782, "queue"=>"low"},
# ...
# {"ctime"=>1506283781894, "etime"=>1506284781894, "queue"=>"medium"}
# ]:each_with_object({})>
这表明g, h = e.next
# => [{"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"}, {}]
g #=> {"ctime"=>1506061704724, "etime"=>1506083304724, "queue"=>"low"}
h #=> {}
f = g["queue"]
#=> "low"
diff = g["etime"]-g["ctime"]
#=> 1506083304724 - 1506061704724 => 21600000
j = range_mins.rindex { |mn| mn <= diff }
#=> 4
是range_mins[4] #=> 18_000_000
的最大值,小于或等于range_mins
(diff
)`。接着,
21_600_000
然后,枚举器k = [f, j]
#=> ["low", 4]
h[k] += 1
#=> 1
h #=> {["low", 4]=>1}
将下一个值传递给块。
e
其余步骤类似。
答案 1 :(得分:0)
group_by
的连续应用可以做到这一点。像
data['tickets'].group_by { |ticket| ticket['queue'] }.transform_values do |tickets|
tickets.group_by |ticket|
# categorize ticket by time until expiry
end
end
这导致嵌套哈希,其中第一级键是队列名称,第二级键是您为到期时间选择的任何类别。