Question

我正在尝试将像Presentation about "Test Driven Development"这样的字符串拆分成这样的数组：

[ 'Presentation',
  'about',
  '"Behavior Driven Development"' ]

我尝试了CSV::parse_line(string, col_sep: ' ')，但结果是

[ 'Presentation',
  'about',
  'Behavior Driven Development' ] # I'm missing the quotes here

我也尝试了一些正则表达式魔法，但我还是初学者并没有成功。我想对于专业人士来说这很简单，所以也许有人可以指出我正确的方向？感谢。

Answer 1

您可以使用以下正则表达式split：

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

如果有空格，它会分裂，但只有在结尾之后的文本包含偶数"时才会分裂。请注意，只有正确引用所有字符串时，此版本才有效。

另一种解决方案是使用scan来读取字符串的各个部分（除了空格）：

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]

Answer 2

为了扩展霍华德之前的答案，你可以添加这个方法：

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

结果：

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]

Answer 3

下面：

"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}

试图将字符串拆分为单个单词或“引用的单词”，并希望在结果数组中保留引号

3 个答案: