如何在Ruby中拆分文本而不创建空字符串?

时间:2012-03-15 03:33:57

标签: ruby-on-rails ruby string parsing split

分割空白,句号,逗号或双引号,而不是单引号:

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.split(/\s|\.|,|"/)
=> ["this", "is", "the", "string", "", "", "", "to's", "split", "real", "", "ok", "", "nice-like"]

如何雄辩地删除空字符串?

如何雄辩地删除短于MIN_LENGTH的字符串?

6 个答案:

答案 0 :(得分:8)

我对问题域并不完全清楚,但如果你只是想避免使用空字符串,为什么不拆分分隔符的一个或多个

str.split /[\s\.,"]+/

答案 1 :(得分:7)

在这种情况下,使用split的想法并不正确。您应该使用scan

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.scan(/[\w'-]+/)
# => ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

为了匹配MIN_LENGTH或更长的字符串,请执行以下操作:

MIN_LENGTH = 3
str.scan(/[\w'-]{#{MIN_LENGTH},}/)
# => ["this", "the", "string", "to's", "split", "real", "nice-like"]

何时使用拆分,何时使用扫描

  • 当分隔符混乱并且使正则表达式匹配时很困难,请使用scan
  • 当要提取的子字符串混乱并且使正则表达式匹配时很困难,请使用split
  • 如果要在要提取的子字符串形式上施加条件,则scan
  • 如果要在分隔符的表单上强加条件,请使用split

答案 2 :(得分:6)

我认为一个简单的方法如下:

str.split(/\s|\.|,|"/).select{|s| s.length >= MIN_LENGTH}

答案 3 :(得分:2)

尝试以下方法:

str.split(/\s*[.,"\s]\s*/)

答案 4 :(得分:2)

我们可以通过多种方式实现同​​样的目标,

 > str.split(/[\s\.,"]/) - [""]
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.split(/[\s\.,"]/).select{|sub_string| sub_string.present?}
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

 > str.scan /\w+'?\w+/
=> ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice", "like"]

答案 5 :(得分:1)

MIN_LENGTH = 2

new_strings = str.split(/\s|\.|,|"/).reject{ |s| s.length < MIN_LENGTH }