Question

我有一个字符串，说“Hello_World我正在学习，Ruby”。我想将这个字符串拆分成每个不同的单词，最好的方法是什么？

谢谢！下进行。

Answer 1

您可以将\ W用于任何非单词字符：

"Hello_World I am Learning,Ruby".split /[\W_]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

"Hello_World I am Learning,   Ruby".split /[\W_]+/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 2

您可以使用带有正则表达式模式的String.split作为参数。像这样：

"Hello_World I am Learning,Ruby".split /[ _,.!?]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 3

ruby-1.9.2-p290 :022 > str =  "Hello_World I am Learning,Ruby"
ruby-1.9.2-p290 :023 > str.split(/\s|,|_/)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 4

String#Scan似乎是执行此任务的合适方法

irb(main):018:0> "Hello_World    I am Learning,Ruby".scan(/[a-z]+/i)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

或者您可以使用内置匹配器\w

irb(main):020:0> "Hello_World    I am Learning,Ruby".scan(/\w+/)
=> ["Hello_World", "I", "am", "Learning", "Ruby"]

Answer 5

虽然上面的例子有效，但我认为将字符串拆分为单词以拆分不被认为是任何单词的一部分的字符可能会更好。为此，我这样做了：

str =  "Hello_World I am Learning,Ruby"
str.split(/[^a-zA-Z]/).reject(&:empty?).compact

此声明执行以下操作：

按字母
然后拒绝任何空字符串
并从数组中删除所有空值

然后它将处理大多数单词组合。以上示例要求您列出要匹配的所有字符。指定你不会认为是单词的一部分的字符要容易得多。

Answer 6

只是为了好玩，一个1.9的Unicode感知版本（或Oniguruma的1.8）：

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|\p{Connector_Punctuation}/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

或者也许：

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|_/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

真正的问题是在这种情况下确定哪个字符序列构成“单词”。您可能希望查看支持的字符属性Oniguruma docs，Wikipedia has some notes on the properties。

Ruby字符串分割为多个字符

6 个答案: