我正在解析ruby脚本中的大型CSV文件,需要从某些搜索键中找到最接近的标题匹配项。搜索键可能是一个或多个值,并且值可能不完全匹配,如下所示(应该接近)
search_keys = ["big", "bear"]
包含我需要搜索的数据的大型数组,只想搜索title
列:
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
在这种情况下,我希望它返回行["3", "three big bears", "2626", "a heart warmer"]
,因为这是与我的搜索键最接近的匹配。
我希望它从给定的搜索键返回最接近的匹配。
我可以使用任何助手/图书馆/宝石吗?任何人之前都这样做过吗?
答案 0 :(得分:2)
我担心,这个任务应该被处理到数据库级别或类似的任何搜索引擎,没有点在应用程序中获取数据并且跨列/行等进行搜索,应该是昂贵的。但现在这是一个简单的方法:)
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
h = {}
search_keys = ["big", "bear"]
array[1..-1].each do |rec|
rec_id = rec[0].to_i
search_keys.each do |key|
if rec[1].include? key
h[rec_id] = h[rec_id] ? (h[rec_id]+1) : 1
end
end
end
closest = h.keys.first
h.each do |rec, count|
closest = rec if h[closest] < h[rec]
end
array[closest] # => desired output :)
答案 1 :(得分:1)
我认为你可以自己做,不需要使用任何宝石! 这可能接近你所需要的;在数组中搜索键并为每个找到的元素设置排名。
result = []
array.each do |ar|
rank = 0
search_keys.each do |key|
if ar[1].include?(key)
rank += 1
end
end
if rank > 0
result << [rank, ar]
end
end
此代码可以比上面更好地编写,但我想向您展示详细信息。
答案 2 :(得分:1)
这很有效。将查找并返回匹配*行的数组作为result
。
*匹配的行= id,标题,代码或描述与提供的seach_keys中的任何一个匹配的行。包括'熊'中的'熊'等部分搜索
result = []
array.each do |a|
a.each do |i|
search_keys.each do |k|
result << a if i.include?(k)
end
end
end
result.uniq!
答案 3 :(得分:1)
你可以用更简洁的方式写出来......
array = [
["id", "title", "code", "description"],
["1", "once upon a time", "3241", "a classic story"],
["2", "a big bad wolf", "4235", "a little scary"],
["3", "three big bears", "2626", "a heart warmer"]
]
search_keys = ["big", "bear"]
def sift(records, target_field, search_keys)
# find target_field index
target_field_index = nil
records.first.each_with_index do |e, i|
if e == target_field
target_field_index = i
break
end
end
if target_field_index.nil?
raise "Target field was not found"
end
# sums up which records have a match and how many keys they match
# key => val = record => number of keys matched
counter = Hash.new(0) # each new hash key is init'd with value of 0
records.each do |record| # look at all our given records
search_keys.each do |key| # check each search key on the field
if record[target_field_index].include?(key)
counter[record] += 1 # found a key, init to 0 if required and increment count
end
end
end
# find the result with the most search key matches
top_result = counter.to_a.reduce do |top, record|
if record[1] > top[1] # [0] = record, [1] = key hit count
top = record # set to new top
end
top # continue with reduce
end.first # only care about the record (not the key hit count)
end
puts "Top result: #{sift array, 'title', search_keys}"
# => Top result: ["3", "three big bears", "2626", "a heart warmer"]
答案 4 :(得分:1)
这是我的单行镜头
p array.find_all {|a|a.join.scan(/#{search_keys.join("|")}/).length==search_keys.length}
=>[["3", "three big bears", "2626", "a heart warmer"]]
以匹配数量的顺序获取所有行
p array.drop(1).sort_by {|a|a.join.scan(/#{search_keys.join("|")}/).length}.reverse
任何人都知道如何组合最后一个解决方案,以便删除不包含任何键的行,并保持原样简洁?