用于条带化非字母和非数字字符的正则表达式

时间:2014-01-21 18:48:49

标签: ruby-on-rails ruby regex

总编程新手在这里。在ruby中,我将如何去除以下非字母和非数字字符串,然后通过将其拆分为空格将字符串拆分为数组。

实施例

string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."

进入这个

tokenized_string = ["Honey", "a", "sweet", "sticky", "yellow", "fluid", "made", "by", "bees", "and", "other", "insects", "from", "nectar", "collected", "from", "flowers"]

非常感谢任何帮助!

4 个答案:

答案 0 :(得分:2)

我会用:

string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.delete('^A-Za-z0-9 ').split 
# => ["Honey",
#     "a",
#     "sweet",
#     "sticky",
#     "yellow",
#     "fluid",
#     "made",
#     "by",
#     "bees",
#     "and",
#     "other",
#     "insects",
#     "from",
#     "nectar",
#     "collected",
#     "from",
#     "flowers"]

如果您尝试删除除字母数字之外的所有内容,则无法使用\w字符类,因为它被定义为[A-Za-z0-9_],这允许_泄漏或挤过去。这是一个例子:

'foo_BAR12'[/\w+/] # => "foo_BAR12"

匹配整个字符串,包括_

'foo_BAR12'[/[A-Za-z0-9]+/] # => "foo"

停留在_,因为班级[A-Za-z0-9]不包括它。

\w应该被视为变量名称的匹配模式,而不是字母数字。如果你想要一个字母数字字符集,请查看POSIX \[\[:alnum:\]\]类:

'foo_BAR12'[/[[:alnum:]]+/] # => "foo"

答案 1 :(得分:1)

使用String#scan

进行操作
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.scan(/[a-zA-Z0-9]+/)
# => ["Honey",
#     "a",
#     "sweet",
#     "sticky",
#     "yellow",
#     "fluid",
#     "made",
#     "by",
#     "bees",
#     "and",
#     "other",
#     "insects",
#     "from",
#     "nectar",
#     "collected",
#     "from",
#     "flowers"]

答案 2 :(得分:1)

有很多可能性,例如:

string.gsub(/\W/) { |m| m if m == ' ' }.split

或者,甚至更清楚:

string.gsub(/\W/) { |m| m if m.strip.empty? }.split

答案 3 :(得分:1)

很简单。以下为您提供所需的数组,而无需使用split

string.scan(/\w+/)

Rubular.com上玩它。