我想计算大写字母的数量来检测字符串中大写字母的百分比。我尝试用正则表达式来做
string.match(/[A-Z]*/)
,但这只会匹配第一个大写字母组合。
答案 0 :(得分:5)
string.scan()
适用于整个字符串,应该适用于您的用例。以下应该有效:
your_string = "Hello World"
capital_count = your_string.scan(/[A-Z]/).length
答案 1 :(得分:2)
以下是一些不涉及将字符串转换为字符数组的方法。
CAPS = ('A'..'Z')
ALL_CAPS = CAPS.to_a.join
#=> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
CHAR_TO_BIN = 128.times.with_object({}) do |i,h|
c = i.chr
h[c] = (CAPS.cover?(c) ? 1 : 0)
end
#=> {"\x00"=>0, "\x01"=>0, "\x02"=>0,...," "=>0, "!"=>0,...,
"0"=>0, "1"=>0,..."9"=>0, ":"=>0, ";"=>0, "<"=>0, "="=>0,
">"=>0, "?"=>0, "@"=>0, "A"=>1, "B"=>1,..."Z"=>1, "["=>0,...,
"a"=>0, "b"=>0,...,"z"=>0, "{"=>0,...,"\x7F"=>0}
str = "The quick brown dog, 'Lightning', jumped over 'Bubba', the lazy fox"
1:效率不高,但目前为止最快且读得很好
str.count(ALL_CAPS)
#=> 3
2:高效
str.each_char.reduce(0) { |t,c| t + (CAPS.cover?(c) ? 1 : 0) }
#=> 3
3:如果你需要做多次(可能比#2快)
str.each_char.reduce(0) { |t,c| t + CHAR_TO_BIN[c] }
#=> 3
4:删除所有非上限并计算
str.gsub(/[^A-Z]/,'').size
#=> 3
或删除所有大写字数并计算:
str.size - str.gsub(/[A-Z]/,'').size
#=> 3
答案 2 :(得分:1)
我认为比较所提出的各种方法的效率会很有意思。
require 'fruity'
CAPS = ('A'..'Z')
ALL_CAPS = CAPS.to_a.join
#=> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
CHAR_TO_BIN = 128.times.with_object({}) do |i,h|
c = i.chr
h[c] = (CAPS.cover?(c) ? 1 : 0)
end
lower = ('a'..'z').to_a
upper = ('A'..'Z').to_a
L = 50_000
U = 10_000
测试字符串包含L
随机抽取的小写字母和U
随机抽取的大写字母,随机播放。
str = L.times.map {lower.sample}.concat(U.times.map {upper.sample}).shuffle.join
compare do
scan { str.scan(/[A-Z]/).length }
count { str.count(ALL_CAPS) }
reduce { str.each_char.reduce(0) { |t,c| t + (CAPS.cover?(c) ? 1 : 0) } }
hsh { str.each_char.reduce(0) { |t,c| t + CHAR_TO_BIN[c] } }
gsubA { str.gsub(/[^A-Z]/,'').size }
gsubB { str.size - str.gsub(/[A-Z]/,'').size }
end
Running each test 32 times. Test will take about 33 seconds.
count is faster than gsubB by 39x ± 10.0
gsubB is similar to scan
scan is faster than gsubA by 3x ± 1.0
gsubA is similar to hsh
hsh is similar to reduce
我对String#count的快速感到惊讶。我曾假设Ruby会为字符串中的每个字符执行include?
。我错了。查看源代码,有一个C
函数tr_setup_table
,表明Ruby在进行计数之前正在构造一个哈希或类似的东西。