Question

我想计算大写字母的数量来检测字符串中大写字母的百分比。我尝试用正则表达式来做 string.match(/[A-Z]*/)，但这只会匹配第一个大写字母组合。

Answer 1

string.scan()适用于整个字符串，应该适用于您的用例。以下应该有效：

your_string = "Hello World"
capital_count = your_string.scan(/[A-Z]/).length

Answer 2

以下是一些不涉及将字符串转换为字符数组的方法。

CAPS = ('A'..'Z')
ALL_CAPS = CAPS.to_a.join
  #=> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
CHAR_TO_BIN = 128.times.with_object({}) do |i,h|
  c = i.chr
  h[c] = (CAPS.cover?(c) ? 1 : 0)
end
  #=> {"\x00"=>0, "\x01"=>0, "\x02"=>0,...," "=>0, "!"=>0,...,
       "0"=>0, "1"=>0,..."9"=>0, ":"=>0, ";"=>0, "<"=>0, "="=>0,
       ">"=>0, "?"=>0, "@"=>0, "A"=>1, "B"=>1,..."Z"=>1, "["=>0,...,
       "a"=>0, "b"=>0,...,"z"=>0, "{"=>0,...,"\x7F"=>0} 

str = "The quick brown dog, 'Lightning', jumped over 'Bubba', the lazy fox"

1：~~效率不高，但~~目前为止最快且读得很好

str.count(ALL_CAPS)
  #=> 3

2：高效

str.each_char.reduce(0) { |t,c| t + (CAPS.cover?(c) ? 1 : 0) }
  #=> 3

3：如果你需要做多次（可能比＃2快）

str.each_char.reduce(0) { |t,c| t + CHAR_TO_BIN[c] }
  #=> 3

4：删除所有非上限并计算

str.gsub(/[^A-Z]/,'').size
  #=> 3

或删除所有大写字数并计算：

str.size - str.gsub(/[A-Z]/,'').size
  #=> 3

Answer 3

我认为比较所提出的各种方法的效率会很有意思。

require 'fruity'

CAPS = ('A'..'Z')
ALL_CAPS = CAPS.to_a.join
  #=> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
CHAR_TO_BIN = 128.times.with_object({}) do |i,h|
  c = i.chr
  h[c] = (CAPS.cover?(c) ? 1 : 0)
end

lower = ('a'..'z').to_a
upper = ('A'..'Z').to_a

L = 50_000
U = 10_000

测试字符串包含L随机抽取的小写字母和U随机抽取的大写字母，随机播放。

str = L.times.map {lower.sample}.concat(U.times.map {upper.sample}).shuffle.join

compare do 
  scan   { str.scan(/[A-Z]/).length }
  count  { str.count(ALL_CAPS) }
  reduce { str.each_char.reduce(0) { |t,c| t + (CAPS.cover?(c) ? 1 : 0) } }
  hsh    { str.each_char.reduce(0) { |t,c| t + CHAR_TO_BIN[c] } }
  gsubA  { str.gsub(/[^A-Z]/,'').size }  
  gsubB  { str.size - str.gsub(/[A-Z]/,'').size }
end

Running each test 32 times. Test will take about 33 seconds.

count is faster than gsubB by 39x ± 10.0
gsubB is similar to scan
scan  is faster than gsubA by 3x ± 1.0
gsubA is similar to hsh
hsh   is similar to reduce

我对String#count的快速感到惊讶。我曾假设Ruby会为字符串中的每个字符执行include?。我错了。查看源代码，有一个C函数tr_setup_table，表明Ruby在进行计数之前正在构造一个哈希或类似的东西。

在Ruby中计算大写字母

3 个答案: