libstemmer sphinx不起作用

时间:2015-09-04 13:58:42

标签: sphinx stemming snowball

我在我的流浪汉机器上安装了sphinx和CentOs 6,我正在尝试从Snowball安装荷兰libstemmer。 安装成功执行但测试出错。

我创建了2个具有完全相同数据的索引。 我的索引是:

index shop_products1 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/shop_products2

  morphology = libstemmer_nl, stem_en
  
  html_strip = 1
  html_index_attrs = img=alt,title; a=title;

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  
  rt_field = name
  rt_field = brand
  rt_field = description
  rt_field = specifications
  rt_field = tags
  rt_field = ourtags
  rt_field = searchfield
  rt_field = shop
  rt_field = category
  
  rt_field = color
  rt_field = ourcolor
  rt_field = gender
  rt_field = material

  rt_field = ean
  rt_field = sku

  rt_attr_string = ean
  rt_attr_string = sku
  rt_attr_float = price
  rt_attr_float = discount
  rt_attr_uint = shopid
  rt_attr_uint = itemid
  rt_attr_uint = deleted
  rt_attr_uint = duplicate
  rt_attr_uint = brandid
  rt_attr_uint = duplicates
  rt_attr_timestamp = updated_at
}

index shop_products2 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/shop_products20

  html_strip = 1
  html_index_attrs = img=alt,title; a=title;

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  
  rt_field = name
  rt_field = brand
  rt_field = description
  rt_field = specifications
  rt_field = tags
  rt_field = ourtags
  rt_field = searchfield
  rt_field = shop
  rt_field = category
  
  rt_field = color
  rt_field = ourcolor
  rt_field = gender
  rt_field = material

  rt_field = ean
  rt_field = sku

  rt_attr_string = ean
  rt_attr_string = sku
  rt_attr_float = price
  rt_attr_float = discount
  rt_attr_uint = shopid
  rt_attr_uint = itemid
  rt_attr_uint = deleted
  rt_attr_uint = duplicate
  rt_attr_uint = brandid
  rt_attr_uint = duplicates
  rt_attr_timestamp = updated_at
}




searchd {
	listen = 127.0.0.1:9306:mysql41
  log = /var/log/sphinxsearch/searchd.log
  workers = threads
  binlog_path = /var/lib/sphinxsearch/rt-binlog

  read_timeout = 5
  client_timeout = 200
  max_children = 0
  	
  # 2 hours
  rt_flush_period = 7200
  pid_file = /var/run/searchd.pid
  
}

当我搜索荷兰语单词“afzuigkappen”时,它必须给出与“afzuigkap”完全相同的结果

有人可以给我一些关于如何让这项工作的信息吗? PS。抱歉我的英语不好..

2 个答案:

答案 0 :(得分:0)

荷兰雪球运动员以<h:button value="reset" /> afzuigkappen的方式不同:

afzuigkap

所以你应该更新词干分析器算法,以便参考你的目标,关于算法的文档here

答案 1 :(得分:0)

好吧,我已经创建了一些特定的测试。 我创建的索引:

index test1 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  morphology = libstemmer_nl, stem_en

  path = /var/lib/sphinxsearch/data/test1

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  rt_field = name
  rt_attr_uint = shopid
  rt_attr_uint = itemid
    
}

index test2 {
  type = rt
  dict = keywords
  min_prefix_len = 3
  rt_mem_limit = 2046M

  path = /var/lib/sphinxsearch/data/test2

  preopen = 1
  inplace_enable = 1
  index_exact_words = 1

  rt_field = name
  rt_attr_uint = shopid
  rt_attr_uint = itemid
    
}

我使用包含足球产品的较小数据库编制索引,并使用sphinx搜索结果:http://imgur.com/n95Ue8v

如您所见,两者都给出了53条记录的相同输出。如果我只在我的mysql中搜索:select * from tests1 WHERE name LIKE'%keeper%'我得到360结果。