如何使用Lucene以有意义的方式删除停用词

时间:2015-10-21 17:15:07

标签: lucene stop-words text-analysis

我使用SELECT concat(extract(MONTH FROM u.created_at), '-',extract(YEAR FROM u.created_at)) AS "Month-Year", count(s1.user_id) AS "# of Users that Signed up on Any Cloud" FROM ( SELECT user_id, created_at FROM cloud_storage_a UNION SELECT user_id, created_at FROM cloud_storage_b UNION SELECT user_id, created_at FROM cloud_storage_c ) AS s1 INNER JOIN users u ON u.id = s1.user_id GROUP BY 1, EXTRACT(MONTH from u.created_at), EXTRACT(YEAR from u.created_at) ORDER BY EXTRACT(YEAR from u.created_at), EXTRACT(MONTH from u.created_at); 删除了停用词并且还使用了一个词组。然而,当我有一个像#34;心脏病和心脏病这样的短语时,会发生什么呢?分析器做的是它会删除"和#34;从短语的中间把它变成一个毫无意义的短语(输出是"心脏病")。如何处理此问题并仅在必要时删除停用词?

0 个答案:

没有答案