Question

我正在尝试编写一个计算单词数量的脚本，但是，使用一些正则表达式描述了一些例外。

该脚本如下所示：

number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.size {|word| word !~ standalone_number and word !~ standalone_letter and word !~ email_address  } }
puts number_of_words

如您所见，我不想在字数中包含独立的数字，字母或电子邮件地址，

当我读取包含此信息的文本文件时：

1 2 ruby email@email.com

我的字数为4，而我期望获得1（红宝石只包括在计数中）。

我在这里缺少什么？

感谢。

修改

我修复了“standalone_letter”正则表达式，因为它是错误地写的，类似于“email_address”正则表达式。

我已经使用我添加到答案中的解决方案解决了这个问题。

Answer 1

Array#size不会像这样阻止。您正在寻找Array#count。

line.split.count { ... }

另外，只是一个想法，而不是循环通过增加计数器的文本行，看起来你只是直接检查你的原始文本，换行符和所有，并得到相同的结果。

Answer 2

问题是因为您使用size来计算数组中元素的数量，并且它不接受块。你必须使用count并且每件事情都会顺利进行。

所以匹配清洁解决方案是这样的。

standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/

text = file.read
num_of_words = text.split.count{ |word| [standalone_number, standalone_letter, email_address].none?{ |regexp| word =~ regexp } }

puts num_of_words

Answer 3

您还可以按如下方式从数组中删除匹配的单词：

text.each_line(){ |line| number_of_words = number_of_words + line.split.delete_if {|word| word ~ standalone_number and word ~ standalone_letter and word ~ email_address }.size }
puts number_of_words

这将删除匹配的元素，然后计算数组的大小。

Answer 4

这有效！

text = File.open('xyz.txt', 'r')
number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /^[a-zA-Z]$/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.count {|word|  word !~ standalone_number && word !~ standalone_letter && word !~  email_address }}
puts number_of_words

没有从脚本获得预期的输出

4 个答案: