Active Records文本匹配

时间:2014-09-01 04:01:41

标签: ruby activerecord

我正在尝试使用活动记录在Ruby中进行一些基本的文本匹配。

Here is my code so far;

require 'active_record'
require 'yaml'
require 'pg'
require 'pry'
require 'FileUtils'

$config = '
adapter: postgresql
database: edgar
username: YYYYY
password:
host: 127.0.0.1'

ActiveRecord::Base.establish_connection(YAML::load($config))
class Doc    < ActiveRecord::Base; end
class Eightk < ActiveRecord::Base; end



directory = "disease"       #Creates a directory called disease
FileUtils.mkpath(directory)     # Makes the directory if it doesn't exists

cancer = Eightk.where("text ilike '%cancer%'")
death = Eightk.where("text ilike '%death%'")


cancer.each do |filing|     #filing can be used instead of eightks
    filename = "#{directory}/#{filing.doc_id}.html"
    File.open(filename,"w").puts filing.text
    puts "Storing #{filing.doc_id}..."


death.each do |filing|  #filing can be used instead of eightks
    filename = "#{directory}/#{filing.doc_id}.html"
    File.open(filename,"w").puts filing.text
    puts "Storing #{filing.doc_id}..."

    end
end

我有一长串我想要搜索的条件;

  1. 我有没有办法合并搜索列表。我试过癌症&#39; |&#39;死亡&#39;但没有任何运气
  2. 我想与单词完全匹配,而不是ilike,但不知道命令,
  3. 由于

1 个答案:

答案 0 :(得分:0)

也许像

keywords = %w(cancer death anotherone)
records = Eightk.where keywords.map{|w| "(text ILIKE '%#{w}%')"}.join(' OR ')

records.each do |filing|
  filename = "#{directory}/#{filing.doc_id}.html"
  File.open(filename,"w").puts filing.text
end

否则你可以使用'SIMILAR TO'或'POSIX'http://www.postgresql.org/docs/8.1/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP然后你可以使用正则表达式。

例如

Eightk.where "text SIMILAR TO '%(#{keywords.join '|' })%'"

POSIX允许您检查单词的开头和结尾,这样您就可以只检查完整的单词匹配(例如匹配,deathdeathdeath.但是不是deathbed等。

我会把正则表达式的东西留给那些有更多regex-foo的人:)