我有一个名为oferson of interest
的节目。
在我的代码中,我试图将其拆分为单个单词,然后将每个单词的首字母标题化,然后将它们连接在一起,每个单词之间有一个空格,然后变为:Oferson Of Interest
。然后,我想搜索单词Of
并将其替换为小写。
我似乎无法弄清楚的问题是,在程序结束时我得到oferson of Interest
这不是我想要的。我只是希望“of”这个词是小写而不是“Oferson”这个词的第一个字母,简单地说我想要Oferson of Interest
而不是oferson of Interest
的输出。
我怎样才能搜索单词'of'而不是句子中'o'和'f'的每个字母?
mine = 'oferson of interest'.split(' ').map {|w| w.capitalize }.join(' ')
if mine.include? "Of"
mine.gsub!(/Of/, 'of')
else
puts 'noting;'
end
puts mine
答案 0 :(得分:1)
最简单的答案是在正则表达式中使用单词边界:
str = "oferson of interest".split.collect(&:capitalize).join(" ")
str.gsub!(/\bOf\b/i, 'of')
# => Oferson of Interest
答案 1 :(得分:0)
您正在处理“stop words”:您出于某种原因不想处理的字词。构建一个您要忽略的停用词列表,并将每个单词与它们进行比较,看看是否要对其进行进一步处理:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
puts text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
end
# >> a Stitch In Time Saves Nine
# >> the Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
这是一个简单的例子,但展示了基础知识。在现实生活中,你会想要处理标点符号,比如连字符。
我使用了Set,因为随着停用词列表的增长,它非常快;它类似于Hash,因此检查比在数组上使用include?
更快:
require 'set'
require 'fruity'
LETTER_ARRAY = ('a' .. 'z').to_a
LETTER_SET = LETTER_ARRAY.to_set
compare do
array {LETTER_ARRAY.include?('0') }
set { LETTER_SET.include?('0') }
end
# >> Running each test 16384 times. Test will take about 2 seconds.
# >> set is faster than array by 10x ± 0.1
当你想要保护结果字符串的第一个字母时,它会变得更有趣,但简单的诀窍是如果重要的话就强制将该字母重写为大写字母:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
str = text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
str[0] = str[0].upcase
puts str
end
# >> A Stitch In Time Saves Nine
# >> The Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
除非您处理非常一致的文本模式,否则这对正则表达式来说不是一个好任务。既然你正在研究电视节目的名称,那么你可能不会发现很多一致性,你的模式会很快变得复杂。