重构:循环从嵌套数组中检索元素

时间:2016-01-05 16:19:39

标签: arrays ruby performance refactoring

我需要遍历一个哈希数组,每个哈希都包含一个标签和一个数据数组。最终结果将是串联字符串,首先是标签,然后是与该标签对应的数据。

哈希的输入数组如下:
[{label: "first", data: [1, 2]}, {label: "second", data: [3, 4, 5]}, {label: "third", data: []}, {label: "fourth", data: [6]}]
在此示例中,max_returns会像10

那样高
def round_robin(arr, max_returns)
  result = ''
  i = 0 # number of grabbed elements
  j = 0 # inner array position
  k = 0 # outer array position
  l = 0 # number of times inner array length has been exceeded
  while i < max_returns do
    if k >= arr.length
      j += 1
      k = 0
    end
    element = arr[k]
    if element[:data].empty?
      k += 1
      next
    end

    if j >= element[:data].length
      l += 1
      k += 1

      if l > arr.length && i < max_returns
        break
      end
      next
    end
    result += element[:label] + ': ' + element[:data][j].to_s + ', '
    i += 1
    k += 1
  end
  result
end

根据上面给出的输入,输出应为:
"first: 1, second: 3, fourth: 6, first: 2, second: 4, second: 5"

另外:max_returns是检索结果总数的最大数量。因此,如果我的示例有max_returns = 3,那么输出将是:
"first: 1, second: 3, fourth: 6"

问题:是否有更好或更有效的方式以循环方式从多个阵列中获取数据?

5 个答案:

答案 0 :(得分:3)

▶ input = [{label: "first", data: [1, 2]},
           {label: "second", data: [3, 4, 5]},
           {label: "third", data: []},
           {label: "fourth", data: [6]}]

▶ max = input.max_by { |e| e[:data].size }[:data].size

▶ input.map do |h|
    [[h[:label]] * max].flatten.zip h[:data] # make it N×M (for transpose)
  end.transpose.map do |e|
    e.reject { |_, v| v.nil? }               # remove nils
  end.flatten(1).map { |e| e.join ': ' }.join ', '

#⇒  "first: 1, second: 3, fourth: 6, first: 2, second: 4, second: 5"

如果没有两个最后一个连接,结果将是一个数组数组:

#⇒ [["first", 1], ["second", 3], ["fourth", 6], 
#   ["first", 2], ["second", 4], ["second", 5]]

答案 1 :(得分:3)

这样可行:

arr = [{ label: "first",  data: [1, 3] },
       { label: "second", data: [3, 4, 5] },
       { label: "third",  data: [] },
       { label: "fourth", data: [6] }]

results = []
arr.each do |h|
  h[:data].each_with_index do |d, i|
    results[i] ||= []
    results[i] << "#{h[:label]}: #{d}"
  end
end

results.flatten.join(', ')
#=> "first: 1, second: 3, fourth: 6, first: 3, second: 4, second: 5"

答案 2 :(得分:2)

arr = [{ label: "first",  data: [1, 2] },
       { label: "second", data: [3, 4, 5] },
       { label: "third",  data: [] },
       { label: "fourth", data: [6] }]

labels, data = arr.map { |h| [h[:label], h[:data].dup] }.transpose
  #=> [["first", "second", "third", "fourth"], [[1, 2], [3, 4, 5], [], [6]]] 
data.map(&:size).max.times.with_object([]) do |_,arr|
  labels.each_index do |i|
    d = data[i].shift
    arr << "#{labels[i]}: #{d}" if d
  end
end.join(', ')
  #=> "first: 1, second: 3, fourth: 6, first: 2, second: 4, second: 5"  

答案 3 :(得分:1)

我不确定什么是循环法,但是这里提供了你需要的输出的解决方案:

基于初始数组元素删除的版本:

Thread

不改变初始数组的版本:

Task

答案 4 :(得分:1)

基准测试全部针对相同数据运行。我针对四种不同的场景运行了每个答案:
*_5针对原始数据运行:852,0,0,0 *_500针对相同的数据运行,但最多返回500个 *_2_5针对4个数组中的数据运行,其大小为:656,137,0,59,总共852条记录。
*_2_500针对arr2运行,最大返回值为500。

                       user     system      total        real
OP_5:              0.000000   0.000000   0.000000 (  0.000120)
Mudasobwa_5:       0.000000   0.000000   0.000000 (  0.000108)
Cary_5:            0.010000   0.000000   0.010000 (  0.011316)
Rustam_5:          0.000000   0.000000   0.000000 (  0.000087)
Wand_5:            0.010000   0.000000   0.010000 (  0.003761)
Stefan_5:          0.000000   0.000000   0.000000 (  0.004007)
OP_500:            0.010000   0.010000   0.020000 (  0.017235)
Mudasobwa_500:     0.010000   0.000000   0.010000 (  0.006164)
Cary_500:          0.010000   0.000000   0.010000 (  0.011403)
Rustam_500:        0.010000   0.000000   0.010000 (  0.011884)
Wand_500:          0.010000   0.000000   0.010000 (  0.003743)
Stefan_500:        0.000000   0.000000   0.000000 (  0.002711)
OP_2_5:            0.000000   0.000000   0.000000 (  0.000052)
Mudasobwa_2_5:     0.000000   0.000000   0.000000 (  0.000140)
Cary_2_5:          0.010000   0.000000   0.010000 (  0.008196)
Rustam_2_5:        0.000000   0.000000   0.000000 (  0.000088)
Wand_2_5:          0.000000   0.000000   0.000000 (  0.003338)
Stefan_2_5:        0.010000   0.000000   0.010000 (  0.002597)
OP_2_500:          0.000000   0.000000   0.000000 (  0.002211)
Mudasobwa_2_500:   0.000000   0.000000   0.000000 (  0.006373)
Cary_2_500:        0.010000   0.000000   0.010000 (  0.008455)
Rustam_2_500:      0.020000   0.000000   0.020000 (  0.019453)
Wand_2_500:        0.010000   0.000000   0.010000 (  0.004846)
Stefan_2_500:      0.000000   0.000000   0.000000 (  0.003421)
OP_avg:            0.002500   0.002500   0.005000 (  0.004904)
Mudasobwa_avg:     0.002500   0.000000   0.002500 (  0.003196)
Cary_avg:          0.010000   0.000000   0.010000 (  0.009843)
Rustam_avg:        0.007500   0.000000   0.007500 (  0.007878)
Wand_avg:          0.007500   0.000000   0.007500 (  0.003922)
Stefan_avg:        0.002500   0.000000   0.002500 (  0.003184)

与我以前的基准相反,平均值表明Stefan的答案实际上是以0.000012秒击败Mudasobwa的答案最快的答案!

注意:我必须编辑一些答案来模仿原始解决方案尝试做的事情,因此基准代码中有一些额外的东西是故意添加的。
此外,一些解决方案没有使用max_returns限制(或者没有停止在极限位置),导致它们比其他解决方案花费的时间更长(当我最初询问时,我责怪自己的解释不那么明确问题)。在选择答案时我没有考虑max_returns限制,因为唯一遵守它的解决方案是我的和魔杖(详情见gist)。

可以在此处找到执行这些基准的代码和示例数据:https://gist.github.com/scytherswings/65644610e20037bb948c

谢谢大家的答案!