我在Elasticsearch中使用Tire gem索引了一些PDF附件。这一切都很好,但我会有很多GB的PDF,我们可能会将这些PDF存储在S3中以便访问。现在,base64编码的PDF存储在Elasticsearch _source中,这将使索引变得庞大。我希望将附件编入索引,但不存储,我还没有找到正确的咒语放入Tire的“映射”块以防止它。块现在就像这样:
mapping do
indexes :id, :type => 'integer'
indexes :title
indexes :last_update, :type => 'date'
indexes :attachment, :type => 'attachment'
end
我尝试了一些变体:
indexes :attachment, :type => 'attachment', :_source => { :enabled => false }
当我运行轮胎时它看起来不错:导入rake任务,但它似乎没有什么区别。有谁知道A)这是否可能?和B)怎么做?
提前致谢。
答案 0 :(得分:4)
_source field settings包含应从源中排除的字段列表。我想如果轮胎出现这种情况应该这样做:
mapping :_source => { :excludes => ['attachment'] } do
indexes :id, :type => 'integer'
indexes :title
indexes :last_update, :type => 'date'
indexes :attachment, :type => 'attachment'
end
答案 1 :(得分:0)
@imotov的解决方案对我不起作用。当我执行curl命令
时curl -X GET "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{"query":{"query_string":{"query":"rspec"}}}'
我仍然可以看到搜索结果中包含的附件文件的内容。
"_source" : {"user_file":{"id":5,"folder_id":1,"updated_at":"2012-08-16T11:32:41Z","attachment_file_size":179895,"attachment_updated_at":"2012-08-16T11:32:41Z","attachment_file_name":"hw4.pdf","attachment_content_type":"application/pdf","created_at":"2012-08-16T11:32:41Z","attachment_original":"JVBERi0xL .....
这是我的实施:
include Tire::Model::Search
include Tire::Model::Callbacks
def self.search(folder, params)
tire.search() do
query { string params[:query], default_operator: "AND"} if params[:query].present?
filter :term, folder_id: folder.id
highlight :attachment_original, :options => {:tag => "<em>"}
end
end
mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment'
end
def to_indexed_json
to_json(:methods => [:attachment_original])
end
def attachment_original
if attachment_file_name.present?
path_to_original = attachment.path
Base64.encode64(open(path_to_original) { |f| f.read })
end
end