Question

在Ruby中，我如何才能对数组进行排序，以使其项（也就是数组）按其长度大小排列，而不只是简单地按长度的升/降序进行排序。

我想使数组项均匀分布，以便有些项包含大量对象，这些对象与较小的数组混合在一起。

例如，我有一个包含数组项的数组，该数组项包含comment中显示的对象数。为了清楚起见，我将它们分成几块，并计算了它们的总大小（请参见下面的说明）。

[
  # chunk 1, inner total length 5
  [{...}], # 2
  [{...}], # 1
  [{...}], # 1
  [{...}], # 1
  # chunk 2, inner total length 11
  [{...}], # 2
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 9
  [{...}], # 3
  [{...}], # 3
  [{...}], # 1
  [{...}], # 2
  # chunk 4, inner total length 15
  [{...}], # 4
  [{...}], # 3
  [{...}], # 4
  [{...}], # 4
]

我想排列数组，使其看起来更像下面的样子。注意：此示例将它们的顺序从最小到最大（1..4），但这不是必需的。我只想将它们分块，以便内部数组的累积长度可比较。

[
  # chunk 1, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 2, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 4, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
]

我这样做的动机是切分外部数组，以便我可以并行处理内部数组。我不希望其中一个并行进程获取一小块大块，而另一个进程获取一片大的大块。

注意：我知道我将有4个并行进程，因此可能有助于告知如何在数组中排列块。谢谢！

Answer 1

这不是一个“完美”的解决方案，但这是一种不太繁重的计算方法：

总结所有内部数组的长度：

total_count = original_list.map(&:count).inject(:+)

确定要在每个并行流程中放置多少个项目（在您的情况下为 4 个流程）：

chunk_size = total_count / 4

现在，这是更难的部分：算法。我将保持非常简单，仅遍历数组中的每个项目，然后"chunk"直到到达chunk_size：

current_chunk_size = 0

original_list.chunk_while do |inner_array|
  current_chunk_size += inner_array.count
  current_chunk_size = 0 if current_chunk_size >= chunk_size
  current_chunk_size > 0
end

如果愿意，可以使用slice_after之类的方法来实现类似的逻辑。

针对原始示例使用此算法：

[
  # chunk 1, inner total length 5
  [{...}], # 2
  [{...}], # 1
  [{...}], # 1
  [{...}], # 1
  # chunk 2, inner total length 11
  [{...}], # 2
  [{...}], # 2
  [{...}], # 3
  [{...}], # 4
  # chunk 3, inner total length 9
  [{...}], # 3
  [{...}], # 3
  [{...}], # 1
  [{...}], # 2
  # chunk 4, inner total length 15
  [{...}], # 4
  [{...}], # 3
  [{...}], # 4
  [{...}], # 4
]

得出结果：

[
  # chunk 1, inner total length 12
  [{...}], # 2
  [{...}], # 1
  [{...}], # 1
  [{...}], # 1
  [{...}], # 2
  [{...}], # 2
  [{...}], # 3

  # chunk 2, inner total length 10
  [{...}], # 4
  [{...}], # 3
  [{...}], # 3

  # chunk 3, inner total length 10
  [{...}], # 1
  [{...}], # 2
  [{...}], # 4
  [{...}], # 3

  # chunk 4, inner total length 8
  [{...}], # 4
  [{...}], # 4
]

...非常接近。

Answer 2

根据我对OP的评论，我将使用该算法来获得大致均匀的大小分布：

unchunked_data = [
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}],
  [{...}]
]

sorted_data = unchunked_data.sort_by(&:size)
grouped_data = sorted_data.each_with_index.group_by { |_, index| index % 4 }

grouped_data.each do |process_index, data|
  # each_with_index would put data in an array with its index in sorted_data. Calling map(&:first) removes that index.
  data_without_index = data.map(&:first)
  send_data_to_process(process_index, data_without_index)
end

如果数据与在OP的示例中显示的一样，那么将导致分布完美。

根据注释中的讨论，您可以通过执行以下操作来获取单个数组中的所有数据，这些数据按原始格式设置，但与此方法组合在一起：

grouped_data.values.flatten(1)

Answer 3

这是另一种启发式方法。¹我将在稍后解释该过程。我们得到：

// Intersection Observer V2
const observer = new IntersectionObserver((changes) => {
  for (const change of changes) {
    // ⚠️ Feature detection
    if (typeof change.isVisible === 'undefined') {
      // The browser doesn't support Intersection Observer v2, falling back to v1 behavior.
      change.isVisible = true;
    }
    if (change.isIntersecting && change.isVisible) {
      visibleSince = change.time;
    } else {
      visibleSince = 0;
    }
  }
}, {
  threshold: [1.0],
  // ? Track the actual visibility of the element
  trackVisibility: true,
  // ? Set a minimum delay between notifications
  delay: 100
}));```

让我们第一个展平一个级别，并按大小对结果数组进行排序。

arr = [[[0,1],         [2],        [3],           [4]],
       [[5,6],         [7,8],      [9,10,11],     [12,13,14,15]],
       [[16,17,18],    [19,20,21], [22],          [23,24]],
       [[25,26,27,28], [29,30,31], [32,33,34,35], [36,37,38,39]]
      ]

nbr_groups = 4

我们需要将sorted = arr.flatten(1).sort_by(&:size) #=> [[2], [3], [4], [22], [0, 1], [5, 6], [7, 8], [23, 24], [9, 10, 11], # [16, 17, 18], [19, 20, 21], [29, 30, 31], [12, 13, 14, 15], # [25, 26, 27, 28], [32, 33, 34, 35], [36, 37, 38, 39]]的元素分组到包含sorted数组的数组result中。这可以通过将nbr_groups的元素“扫掠”到sorted中来完成。扫描包括result个前向分配和相同数量的反向分配。

现在创建一个枚举器。

nbr_groups

我建议采用启发式方法，首先将a = nbr_groups.times.to_a #=> [0, 1, 2, 3] idx = [*a, *a.reverse].cycle #=> #<Enumerator: [0, 1, 2, 3, 3, 2, 1, 0]:cycle>的前nbr_groups个元素分配给sorted，以便将result的第一个元素分配给{的第一个元素{1}}，sorted的第二个元素被分配给result的第二个元素，依此类推。 sorted的下一个result元素类似地分配给nbr_group，但是这次是相反的顺序：sorted的第result个元素被分配给nbr_groups+1的最后一个元素，sorted的第result个元素被分配给nbr_groups+2的倒数第二个元素，依此类推。这些交替分配将继续进行，直到sorted的所有元素都已分配。

result

现在让我们看看如何平均分配这些任务：

sorted

这个结果使我的脸上露出了微笑。 result = sorted.each_with_object(Array.new(nbr_groups) { [] }) do |a,arr| arr[idx.next] << a end #=> [[[2], [23, 24], [9, 10, 11], [36, 37, 38, 39]], # [[3], [7, 8], [16, 17, 18], [32, 33, 34, 35]], # [[4], [5, 6], [19, 20, 21], [25, 26, 27, 28]], # [[22], [0, 1], [29, 30, 31], [12, 13, 14, 15]]]的所有元素都相同的大小当然纯粹是巧合。

^{1。正如@glyoko在评论中指出的那样，该问题是NP完全问题，因此除了最小的问题之外，所有其他问题都必须使用启发式方法。}

按长度对Ruby数组项的数组进行均匀排序

3 个答案: