Question

我有一个数组：

tokens = [["hello","world"],["hello","ruby"]]
all_tokens = tokens.flatten.uniq # all_tokens=["hello","world","ruby"]

现在我需要创建两个对应于all_tokens的数组，其中第一个数组将包含标记子数组中每个单词的位置。 I.E输出：

[[0,0],[1],[1]] # (w.r.t all_tokens)

为了清楚地说明，“hello”的索引在标记的2个子数组中为0和0。

第二个数组包含每个单词的索引w.r.t tokens.I.E输出：

[[0,1],[0],[1]]

为了清楚它读取，你好0,1的索引。 I.E“hello”在令牌数组的索引0和1中。

干杯！

Answer 1

您的方法听起来很难维护。如果你保持当前的路径，你将得到你的tokens数组数组，一个独特的令牌数组（all_tokens），然后再保留两个额外的数组数组跟踪原始tokens结构中唯一令牌的位置。

另一种方法是从最自然的方式开始存储唯一令牌：哈希。在该哈希中，您还可以存储位置信息。这样，所有信息一起传播。

可能有一种更为灵活的方式来实现这一点，但这是一个简单的实现：

tokens = [["hello","world"],["hello","ruby"]]

token_info     = {}
ordered_tokens = []

tokens.each_with_index do |group, i|
    group.each_with_index do |t, j|
        unless token_info.has_key?(t)
            token_info[t] = {:i => [], :j => []}
            ordered_tokens.push(t)
        end
        token_info[t][:i].push(i)
        token_info[t][:j].push(j)
    end
end

ordered_tokens.each do |t|
    p t, token_info[t]
end

Answer 2

我同意FM，但这会创建你的第一个阵列：

tokens = [["hello","world"],["hello","ruby"]]
all_tokens = tokens.flatten.uniq

sublist_indices = all_tokens.collect do |token|
  tokens.inject([]) do |indices, list|
    indices += list.each_with_index.select {|pair| pair[0] == token}.map {|pair| pair[1]}
  end
end  # => [[0, 0], [1], [1]]

剩余剩余作为练习。

在多维数组的子数组中查找每个单词的位置

2 个答案: