在Lucene中,由于标准分析器,搜索默认情况下对用户不敏感。这是用户期望的,并且工作正常。
但是,对于范围查询中的“TO”或“AND”/“OR”等几个词,这些关键字区分大小写。这不是用户的期望。
答案 0 :(得分:2)
Is there a reason for this?
The real question here might not be "why does lucene do this?", but rather "why does google do this?", as I believe Google's use of this pattern predates Lucene's. Regardless, though, the reasoning isn't too hard to deduce. There needs to be a way of differentiating the word "and" from the the query operator "AND".
Say my query is: Jack and Jill went up the hill
I'm just searching a phrase that happens to contain the word "and". The end result I want is (eliminating stop words, and such):
field:jack field:jill field:went field:up field:hill
Rather than:
+field:jack +field:jill field:went field:up field:hill
If the word is uppercased, it's a decent indicator the user intended the word as an operator.
If all ands became operands, users might be confused why a search for "bread and butter pickles" (becomes +bread +butter pickles
) turns up hits about toast, but not about other types of pickles.
Similar for lists of things, like "Abby, Ben, Chris, Dave and Elmer" (becomes abby ben chris +dave +elmer
), which all hits would require Dave and Elmer to be present, but the rest of the names would be optional.
How to make them case insensitive?
Uppercasing the whole thing, or every instance of an AND
, OR
or TO
, could be a bit promblematic. Take these, for example:
[to TO tz]
works, [TO TO TZ]
throws an exceptionand another thing
works, AND ANOTHER THING
throws an exceptionYou could check for a ParseException
after uppercasing, and try parsing the original query in that case. Might create a bit of an inconsistency, but it beats just failing entirely.