总编程新手在这里。在ruby中,我将如何去除以下非字母和非数字字符串,然后通过将其拆分为空格将字符串拆分为数组。
实施例
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
进入这个
tokenized_string = ["Honey", "a", "sweet", "sticky", "yellow", "fluid", "made", "by", "bees", "and", "other", "insects", "from", "nectar", "collected", "from", "flowers"]
非常感谢任何帮助!
答案 0 :(得分:2)
我会用:
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.delete('^A-Za-z0-9 ').split
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]
如果您尝试删除除字母数字之外的所有内容,则无法使用\w
字符类,因为它被定义为[A-Za-z0-9_]
,这允许_
泄漏或挤过去。这是一个例子:
'foo_BAR12'[/\w+/] # => "foo_BAR12"
匹配整个字符串,包括_
。
'foo_BAR12'[/[A-Za-z0-9]+/] # => "foo"
停留在_
,因为班级[A-Za-z0-9]
不包括它。
\w
应该被视为变量名称的匹配模式,而不是字母数字。如果你想要一个字母数字字符集,请查看POSIX \[\[:alnum:\]\]
类:
'foo_BAR12'[/[[:alnum:]]+/] # => "foo"
答案 1 :(得分:1)
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.scan(/[a-zA-Z0-9]+/)
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]
答案 2 :(得分:1)
有很多可能性,例如:
string.gsub(/\W/) { |m| m if m == ' ' }.split
或者,甚至更清楚:
string.gsub(/\W/) { |m| m if m.strip.empty? }.split
答案 3 :(得分:1)