所以我试图定义“#titleize”,这个方法可以将字符串中所有单词的首字母大写,除了绒毛之外的单词,如'the','和','if。 '
到目前为止我的代码:
def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in}
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
words.join(' ')
end
麻烦在于if / else部分 - 当我在字符串上运行方法时,我收到“语法错误,意外$ end,期望keyword_end”。
我认为如果我使用if / else的简写版本代码就行了,它通常会进入{花括号}里面的代码块。我知道这个简写的语法看起来像是
string.scan(/\w+/) { |word| !stopwords.include?(word) words << word.capitalize : words
<< word }
...与
words << word.capitalize
如果!stopwords.include?(word)返回true,则发生,
words << word
如果!stopwords.include?(word)返回false,则发生。但这也不起作用!
它也可能看起来像这样(这是一种不同/更有效的方法 - 没有单独的数组实例化):
string.scan(/\w+/) do |word|
!stopwords.include?(word) word.capitalize : word
end.join(' ')
(来自Calling methods within methods to Titleize in Ruby) ...但是当我运行此代码时,我收到“语法错误”消息。
原来如此!有谁知道我指的语法?你能帮我记住吗? 或者,你能指出这段代码不起作用的另一个原因吗?
答案 0 :(得分:3)
我认为你错过了end
:
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
end #<<<<add this
对于速记版本,请执行以下操作:
string.scan(/\w+/).map{|w| stopwords.include?(w) ? w : w.capitalize}.join(' ')
答案 1 :(得分:1)
Active Support有titleize
方法,它可以作为一个起点,因为它会将字符串中的单词大写,但它并不完全是智能的;它浪费了一些停顿词。尽管如此,还是通过一些后期处理来恢复它们。
我是这样做的:
require 'active_support/core_ext/string/inflections'
STOPWORDS = Hash[
%w{the a by on for of are with just but and to the my had some in}.map{ |w|
[w.capitalize, w]
}
]
def my_titlize(str)
str.titleize.gsub(
/(?!^)\b(?:#{ STOPWORDS.keys.join('|') })\b/,
STOPWORDS
)
end
# => /(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/
my_titlize('Jackdaws love my giant sphinx of quartz.')
# => "Jackdaws Love my Giant Sphinx of Quartz."
my_titlize('the rain in spain stays mainly in the plain.')
# => "The Rain in Spain Stays Mainly in the Plain."
my_titlize('Negative lookahead is indispensable')
# => "Negative Lookahead Is Indispensable"
我这样做的原因是,构建一个YAML文件或数据库表来提供停用词列表非常容易。从那个单词数组中,很容易构建一个哈希值和一个正则表达式,它被馈送到gsub
,然后使用正则表达式引擎来触发停用词。
创建的哈希是:
{
"The"=>"the",
"A"=>"a",
"By"=>"by",
"On"=>"on",
"For"=>"for",
"Of"=>"of",
"Are"=>"are",
"With"=>"with",
"Just"=>"just",
"But"=>"but",
"And"=>"and",
"To"=>"to",
"My"=>"my",
"Had"=>"had",
"Some"=>"some",
"In"=>"in"
}
创建的正则表达式是:
/(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/
当gsub
在正则表达式模式中的某个单词上受到命中时,它会在散列中执行查找并将该值替换回字符串。
代码可以使用downcase
或其他计算方式来反转大写单词,但这会增加开销。 gsub
和正则表达式引擎非常快。部分原因是哈希和正则表达式避免在禁用词列表上循环,因此列表可以很大而不会使代码变慢。当然,引擎已经改变了不同版本的Ruby,因此旧版本不能做得那么好,所以运行Ruby的基准测试&lt; 2.0。
答案 2 :(得分:0)
在次优代码中很难捕获错误。以规范的方式进行,并且容易发现错误。
class String
SQUELCH_WORDS = %w{the a by on for of are with just but and to the my had some in}
def titleize
gsub /\w+/ do |s|
SQUELCH_WORDS.include?( s ) ? s : s.capitalize
end
end
end
"20,000 miles under the sea".titleize #=> "20,000 Miles Under the Sea"
答案 3 :(得分:0)
您不仅错过end
(关闭方法),words.join(' ')
位于scan
区域内,这意味着words is joining every time you iterate through
扫描。“
我想你想要这个:
def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in}
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
end
words.join(' ')
end
虽然您的代码可以清理,但此时基本流程仍然是正确的。