我正在尝试使用活动记录在Ruby中进行一些基本的文本匹配。
Here is my code so far;
require 'active_record'
require 'yaml'
require 'pg'
require 'pry'
require 'FileUtils'
$config = '
adapter: postgresql
database: edgar
username: YYYYY
password:
host: 127.0.0.1'
ActiveRecord::Base.establish_connection(YAML::load($config))
class Doc < ActiveRecord::Base; end
class Eightk < ActiveRecord::Base; end
directory = "disease" #Creates a directory called disease
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
cancer = Eightk.where("text ilike '%cancer%'")
death = Eightk.where("text ilike '%death%'")
cancer.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
death.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
end
end
我有一长串我想要搜索的条件;
由于
答案 0 :(得分:0)
也许像
keywords = %w(cancer death anotherone)
records = Eightk.where keywords.map{|w| "(text ILIKE '%#{w}%')"}.join(' OR ')
records.each do |filing|
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
end
否则你可以使用'SIMILAR TO'或'POSIX'http://www.postgresql.org/docs/8.1/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP然后你可以使用正则表达式。
例如
Eightk.where "text SIMILAR TO '%(#{keywords.join '|' })%'"
POSIX允许您检查单词的开头和结尾,这样您就可以只检查完整的单词匹配(例如匹配,death
,death
或death.
但是不是deathbed
等。
我会把正则表达式的东西留给那些有更多regex-foo的人:)