搜索两个多维数组并将类似的子数组分组在一起

时间:2015-03-03 02:12:25

标签: ruby arrays multidimensional-array

我正在尝试搜索两个多维数组以查找给定子数组中的任何共同元素,然后将结果放在第三个数组中,其中具有相似元素的整个子数组被组合在一起(不仅仅是相似的元素)。 / p>

从两个CSV导入数据:

require 'csv'
array = CSV.read('primary_csv.csv')
  #=> [["account_num", "account_name", "primary_phone", "second_phone", "status],
  #=>  ["11111",        "John Smith",   "8675309",      "            ", "active"], 
  #=>  ["11112",        "Tina F.",      "5551234",      "5555678"     , "disconnected"],
  #=>  ["11113",        "Troy P.",      "9874321",      "            ", "active"]] 
  # and so on...

second_array = CSV.read('customer_service.csv')
  #=> [["date",   "name",      "agent", "call_length", "phone",   "second_phone", "complaint"],
  #=>  ["3/1/15", "Mary ?",    "Bob X", "5:00",        "5551234", "          ",   "rude"],
  #=>  ["3/2/15", "Mrs. Smith", "Stew", "1:45",        "9995678", "8675309"   ,   "says shes not a customer"]] 
  # and so on...

如果primary.csvcustomer_service.csv上的子数组中存在任何数字作为元素,我想要整个子数组(而不仅仅是公共元素),放入第三个数组,{ {1}}。基于上述样本的愿望输出是:

results_array

然后我想将数组导出到一个新的CSV中,其中每个子数组都是它自己的CSV行。我打算通过将results_array = [["11111", "John Smith", "8675309", " ", "active"], ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]] # and so on... ,连接起来对每个子阵列进行迭代,使其以逗号分隔,然后将结果放入新的CSV中:

results_array.each do {|j| j.join(",")}
File.open("results.csv", "w") {|f| f.puts results_array}
  #=> 11111,John Smith,8675309, ,active
  #=> 3/2/15,Mrs. Smith,Stew,1:45,9995678,8675309,says shes not a customer 
  # and so on...

如何实现所需的输出?我知道最终产品看起来很混乱,因为类似的数据(例如,电话号码)将在不同的列中。但我需要找到一种将数据组合在一起的方法。

1 个答案:

答案 0 :(得分:0)

假设a1a2是两个数组(不包括标题行)。

<强>代码

def combine(a1, a2)
  h2 = a2.each_with_index
         .with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
           arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
  a1.each_with_object([]) do |arr, b|
    d = arr.each_with_object([]) do |str, d|
      s = str.strip    
      d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
    end
    b << d.uniq.unshift(arr) if d.any?
  end
end

def number?(str)
  str =~ /^\d+$/
end

示例

以下是您的示例,稍加修改:

a1 = [
  ["11111", "John Smith", "8675309", "",        "active"      ], 
  ["11112", "Tina F.",    "5551234", "5555678", "disconnected"],
  ["11113", "Troy P.",    "9874321", "",        "active"      ]
] 

a2 = [
  ["3/1/15", "Mary ?",     "Bob X", "5:00", "5551234", "",        "rude"],
  ["3/2/15", "Mrs. Smith", "Stew",  "1:45", "9995678", "8675309", "surly"],
  ["3/7/15", "Cher",       "Sonny", "7:45", "9874321", "8675309", "Hey Jude"]
]

combine(a1, a2)
  #=> [[["11111",   "John Smith", "8675309",       "",
  #      "active"],
  #     ["3/2/15",  "Mrs. Smith", "Stew",          "1:45",
  #      "9995678", "8675309",    "surly"],
  #     ["3/7/15",  "Cher",       "Sonny",         "7:45",
  #      "9874321", "8675309",    "Hey Jude"]
  #    ],
  #    [["11112",   "Tina F.",    "5551234",       "5555678",
  #      "disconnected"],
  #     ["3/1/15",  "Mary ?",     "Bob X",         "5:00",
  #      "5551234", "",           "rude"]
  #    ],
  #    [["11113",   "Troy P.",    "9874321",       "",
  #      "active"],
  #     ["3/7/15",  "Cher",       "Sonny",         "7:45",
  #      "9874321", "8675309",    "Hey Jude"]
  #    ]
  #  ]

<强>解释

首先,我们定义一个帮助器:

def number?(str)
  str =~ /^\d+$/
end

例如:

number?("8675309") #=> 0 ("truthy)
number?("3/1/15")  #=> nil

现在对表示数字的值进行索引a2

h2 = a2.each_with_index
       .with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
         arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
  #=> {"5551234"=>[0], "9995678"=>[1], "8675309"=>[1, 2], "9874321"=>[2]} 

例如,这表示&#34;数字&#34;领域&#34; 8675309&#34;包含在a2的抵消1和2的元素中(即,对于史密斯夫人和雪儿)。

我们现在可以简单地浏览a1寻找匹配项的元素。

代码:

arr.each_with_object([]) do |str, d|
  s = str.strip    
  d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
end

逐步执行arr的元素,将每个元素分配给块变量str。例如,如果arr占有a1 str的第一个元素,则将等于"11111""John Smith",依此类推。在s = str.strip之后,这表示如果s具有数字表示并且h2中存在匹配键,则(最初为空)数组d与元素连接由a2的值给出的h2[s]

完成此循环后,我们会看到d是否包含a2的所有元素:

b << d.uniq.unshift(arr) if d.any?

如果是,我们删除重复项,在数组前添加arr并将其保存到b

请注意,这允许a2的一个元素匹配a1的多个元素。