我可以想到在mysql中这样做的复杂和丑陋的方法,但我正在寻找一个好方法。假设我有一堆学校名称,比如
Meopham County Infant School
Speldhurst Nursery School
Rainbow Pre-School
The Annex School House
Fleet Learning Zone
Dartford Grammar School
Kiddliwinks
Hextable Kindergarten
The Rocking Horse Montessori Kinder
Little Angels Day Nursery
我有一个停用词列表:
["school", "primary", "nursery", "college", "junior", "church", "cofe", "community", "infant"]
我有一个ruby函数“short_name”,它返回学校名称,但不包括,任何一个停用词的第一个实例,以便我们得到
"Bower Grove School" => "Bower Grove"
"Fulston Manor School" => "Fulston Manor"
"St Johns Church Hall Play" => "St Johns"
"St Botolph's Church of England Voluntary Aided Primary School" => "St Botolph's"
"Fawkham House School" => "Fawkham House"
"Silverdale Day Nursery" => "Silverdale Day"
"Vigo Village School" => "Vigo Village"
"Sevenoaks Primary School" => "Sevenoaks"
"High Weald Academy" => "High Weald Academy"
"The Ebbsfleet Academy" => "The Ebbsfleet Academy"
这一切都很好。我的问题是:在mysql中进行上述字符串处理的最简单方法是什么?
例如,如果我想通过这个short_name搜索,我想做类似
的事情"select * from schools where <function(name)> = 'Bower Grove'"
最简单的<function>
方式是什么?我认为使用正则表达式的substring()和locate()的某种组合将是可行的方法,但看起来我不能使用带有locate的正则表达式。
我猜正则表达式是
"school|primary|nursery|college|junior|church|cofe|community|infant"
谢谢,Max
答案 0 :(得分:2)
MySQL确实支持正则表达式。不幸的是,它仅用于匹配。
这是一种方法:
select least(substring_index(schoolname, ' School', 1),
substring_index(schoolname, ' Primary', 1),
. . .
)
这使用substring_index()
在分隔符之前提取字符串的第一部分。如果分隔符不存在,则获得整个字符串。然后least()
函数将选择最短的字符串。
这假定该关键字前面有空格。毕竟,你可能不想彻底消除像“小天使学校”这样的名字。