Question

我正在尝试搜索两个多维数组以查找给定子数组中的任何共同元素，然后将结果放在第三个数组中，其中具有相似元素的整个子数组被组合在一起（不仅仅是相似的元素）。 / p>

从两个CSV导入数据：

require 'csv'
array = CSV.read('primary_csv.csv')
  #=> [["account_num", "account_name", "primary_phone", "second_phone", "status],
  #=>  ["11111",        "John Smith",   "8675309",      "            ", "active"], 
  #=>  ["11112",        "Tina F.",      "5551234",      "5555678"     , "disconnected"],
  #=>  ["11113",        "Troy P.",      "9874321",      "            ", "active"]] 
  # and so on...

second_array = CSV.read('customer_service.csv')
  #=> [["date",   "name",      "agent", "call_length", "phone",   "second_phone", "complaint"],
  #=>  ["3/1/15", "Mary ?",    "Bob X", "5:00",        "5551234", "          ",   "rude"],
  #=>  ["3/2/15", "Mrs. Smith", "Stew", "1:45",        "9995678", "8675309"   ,   "says shes not a customer"]] 
  # and so on...

如果primary.csv和customer_service.csv上的子数组中存在任何数字作为元素，我想要整个子数组（而不仅仅是公共元素），放入第三个数组，{ {1}}。基于上述样本的愿望输出是：

results_array

然后我想将数组导出到一个新的CSV中，其中每个子数组都是它自己的CSV行。我打算通过将results_array = [["11111", "John Smith", "8675309", " ", "active"], ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309" , "says shes not a customer"]] # and so on...与,连接起来对每个子阵列进行迭代，使其以逗号分隔，然后将结果放入新的CSV中：

results_array.each do {|j| j.join(",")}
File.open("results.csv", "w") {|f| f.puts results_array}
  #=> 11111,John Smith,8675309, ,active
  #=> 3/2/15,Mrs. Smith,Stew,1:45,9995678,8675309,says shes not a customer 
  # and so on...

如何实现所需的输出？我知道最终产品看起来很混乱，因为类似的数据（例如，电话号码）将在不同的列中。但我需要找到一种将数据组合在一起的方法。

Answer 1

假设a1和a2是两个数组（不包括标题行）。

<强>代码

def combine(a1, a2)
  h2 = a2.each_with_index
         .with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h|
           arr.each { |e| es = e.strip; h[es] << i if number?(es) } }
  a1.each_with_object([]) do |arr, b|
    d = arr.each_with_object([]) do |str, d|
      s = str.strip    
      d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s)
    end
    b << d.uniq.unshift(arr) if d.any?
  end
end

def number?(str)
  str =~ /^\d+$/
end

示例

以下是您的示例，稍加修改：

a1 = [ ["11111", "John Smith", "8675309", "", "active" ], ["11112", "Tina F.", "5551234", "5555678", "disconnected"], ["11113", "Troy P.", "9874321", "", "active" ] ] a2 = [ ["3/1/15", "Mary ?", "Bob X", "5:00", "5551234", "", "rude"], ["3/2/15", "Mrs. Smith", "Stew", "1:45", "9995678", "8675309", "surly"], ["3/7/15", "Cher", "Sonny", "7:45", "9874321", "8675309", "Hey Jude"] ] combine(a1, a2) #=> [[["11111", "John Smith", "8675309", "", # "active"], # ["3/2/15", "Mrs. Smith", "Stew", "1:45", # "9995678", "8675309", "surly"], # ["3/7/15", "Cher", "Sonny", "7:45", # "9874321", "8675309", "Hey Jude"] # ], # [["11112", "Tina F.", "5551234", "5555678", # "disconnected"], # ["3/1/15", "Mary ?", "Bob X", "5:00", # "5551234", "", "rude"] # ], # [["11113", "Troy P.", "9874321", "", # "active"], # ["3/7/15", "Cher", "Sonny", "7:45", # "9874321", "8675309", "Hey Jude"] # ] # ]

<强>解释

首先，我们定义一个帮助器：

def number?(str) str =~ /^\d+$/ end

例如：

number?("8675309") #=> 0 ("truthy) number?("3/1/15") #=> nil

现在对表示数字的值进行索引a2：

h2 = a2.each_with_index .with_object(Hash.new { |h,k| h[k] = [] }) { |(arr,i),h| arr.each { |e| es = e.strip; h[es] << i if number?(es) } } #=> {"5551234"=>[0], "9995678"=>[1], "8675309"=>[1, 2], "9874321"=>[2]}

例如，这表示＆＃34;数字＆＃34;领域＆＃34; 8675309＆＃34;包含在a2的抵消1和2的元素中（即，对于史密斯夫人和雪儿）。

我们现在可以简单地浏览a1寻找匹配项的元素。

代码：

arr.each_with_object([]) do |str, d| s = str.strip d.concat(a2.values_at(*h2[s])) if number?(s) && h2.key?(s) end

逐步执行arr的元素，将每个元素分配给块变量str。例如，如果arr占有a1 str的第一个元素，则将等于"11111"，"John Smith"，依此类推。在s = str.strip之后，这表示如果s具有数字表示并且h2中存在匹配键，则（最初为空）数组d与元素连接由a2的值给出的h2[s]。

完成此循环后，我们会看到d是否包含a2的所有元素：

b << d.uniq.unshift(arr) if d.any?

如果是，我们删除重复项，在数组前添加arr并将其保存到b。

请注意，这允许a2的一个元素匹配a1的多个元素。

搜索两个多维数组并将类似的子数组分组在一起

1 个答案: