Rails正则表达式排除单词之间的符号末尾空格

时间:2019-06-21 23:27:16

标签: regex ruby

我试图标记两行#BLOCK中的每一行。

此外,我想排除所有符号[""," ",{},(),\n]

#BLOCK
#NAME {PC8}
#TYPE GHD3
#PROGRAM "FooBar" (2.0)
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK

现在,我有两种解决方案,但我想将它们组合为一个。

我正在使用Rubular,链接在这里:

示例1:https://rubular.com/r/bd2AxaHB2QLGpt

示例2:https://rubular.com/r/vmxm2kugNhnDCS

我尝试了以下两种解决方案:

  1. (?<=#BLOCK\n)(.*)(?=#BLOCK),它可以正常工作,并且标记了两行#BLOCK中的所有内容。

  2. [^,{},(),""," ",\n]可以排除这些符号,但不会标记两行#BLOCK之间的内容。

如何将两者结合起来以获得预期的结果?

预期结果是两个标记#BLOCK行之间的所有内容,并排除[{},(),""," ",\n]之类的符号。

2 个答案:

答案 0 :(得分:3)

如果用“标记”表示您是匹配,那么我猜您可以尝试一下。
它使用hri.isOffline()构造。

(注意-Ruby使用\G选项表示全部

更新-在不重新启动的情况下不要让它越过下一个块)

//m

https://rubular.com/r/TxlU9yhiUJkrok

解释
注意-此正则表达式一次匹配一个字符。

/(?:(?:(?<=\#BLOCK\n)|(?!^)\G))[,{}()"\s]*\K(?!\#BLOCK\b)[^,{}()"\s](?=.*\#BLOCK\b)/m

要匹配大块字符,请使用此字符。

(?: (?<= \#BLOCK \n ) # A block behind | # or, (?! ^ ) # Not the BOS \G # Start matching where last match left off ) [,{}()"\s]* # Consume optional punctuation and whitespace \K # Disregard anything matched so far (?! \#BLOCK \b ) # Don't go past next block [^,{}()"\s] # Get a single non-punct nor whitespace char (?= .* \#BLOCK \b ) # Only if there is a block ahead

https://rubular.com/r/kyhqnOtIrmrnJ7

解释

/(?:(?<=\#BLOCK\n)|(?!^)\G)[,{}()"\s]*\K(?=.+\#BLOCK\b)(?:(?!\#BLOCK\b)[^,{}()"\s])+/m

答案 1 :(得分:2)

据我了解,您希望提取'#BLOCK'行之间的单词,这些单词由字符串分隔,每个字符都是字符串"^ {}()\"\n#"中的一个字符。我还将解决的另一种解释是,仅提取这些单词的字符。

问题的标题中需要使用正则表达式(形容词“ Rails”应删除,因为这毫无意义)。我建议不要针对此问题使用单个正则表达式。我认为下面提供的代码更直接,更易于遵循和测试,并且在将来需求发生变化时更易于维护。

代码

def exclude(str)
  arr = str.split(/^#BLOCK$/).drop(1)
  arr.pop unless str.end_with?('#BLOCK')
  arr.flat_map { |s| s.scan(/[^ {}()"\n]+/) }
end

示例

str =<<END
cat
#BLOCK
#NAME PC8
#TYPE GHD3
#PROGRAM "FooBar" 2.0
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK
#DATE 20000101
#BASE 0473 "9873HGXR"
#PROGRAM "BarBaz" 3.0
#BLOCK
dog
END

extract str
  #=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar",
  #    "2.0", "#DATE", "20190501", "#BASE", "3740", "TXGH3789"]

现在从str组成一个字符串,并以'#BLOCK'行结尾。

str1 = str.gsub(/^cat\n|^dog\n/, '')
puts str1
#BLOCK
#NAME PC8
#TYPE GHD3
#PROGRAM "FooBar" 2.0
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK
#DATE 20000101
#BASE 0473 "9873HGXR"
#PROGRAM "BarBaz" 3.0
#BLOCK

我们看到了

exclude(str1)
  #=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar", "2.0",
  #    "#DATE", "20190501", "#BASE", "3740", "TXGH3789", "#DATE",
  #    "20000101", "#BASE", "0473", "9873HGXR", "#PROGRAM", "BarBaz", "3.0"] 

返回与exclude(str)相同的数组。

说明

对于上面定义的str,步骤如下。

arr = str.split(/^#BLOCK$/)
  #=> ["cat\n",
  #    "\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
  #    "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n",
  #    "\ndog\n"] 
arr = arr.drop(1)
  #   ["\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
  #    "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n",
  #    "\ndog\n"] 
  str.end_with?('#BLOCK')
    #=> false 
arr.pop
  #=> "\ndog\n" 
arr
  #=> ["\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
  #    "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n"] 
arr.flat_map { |s| s.scan(/[^ {}()"\n]+/) }
  #=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar", "2.0",
  #    "#DATE", "20190501", "#BASE", "3740", "TXGH3789", "#DATE",
  #    "20000101", "#BASE", "0473", "9873HGXR", "#PROGRAM", "BarBaz", "3.0"] 

问题的替代解释

如果只需要extract(str)中单词的字符,则可以这样写:

extract(str).join
  #=> "#NAMEPC8#TYPEGHD3#PROGRAMFooBar2.0#DATE20190501#BASE3740TXGH3789"

extract(str).join.chars
  #=> ["#", "N", "A", "M", "E", "P",..., "z", "3", ".", "0"] 

或在'+'参数的正则表达式中删除scan

def exclude(str)
  arr = str.split(/^#BLOCK$/).drop(1)
  arr.pop unless str.end_with?('#BLOCK')
  arr.flat_map { |s| s.scan(/[^ {}()"\n]/) }
end

exclude(str)
  #=> ["#", "N", "A", "M", "E", "P",..., "z", "3", ".", "0"]