我试图标记两行#BLOCK
中的每一行。
此外,我想排除所有符号[""," ",{},(),\n]
#BLOCK
#NAME {PC8}
#TYPE GHD3
#PROGRAM "FooBar" (2.0)
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK
现在,我有两种解决方案,但我想将它们组合为一个。
我正在使用Rubular,链接在这里:
示例1:https://rubular.com/r/bd2AxaHB2QLGpt
示例2:https://rubular.com/r/vmxm2kugNhnDCS
我尝试了以下两种解决方案:
(?<=#BLOCK\n)(.*)(?=#BLOCK)
,它可以正常工作,并且标记了两行#BLOCK
中的所有内容。
[^,{},(),""," ",\n]
可以排除这些符号,但不会标记两行#BLOCK
之间的内容。
如何将两者结合起来以获得预期的结果?
预期结果是两个标记#BLOCK
行之间的所有内容,并排除[{},(),""," ",\n]
之类的符号。
答案 0 :(得分:3)
如果用“标记”表示您是匹配,那么我猜您可以尝试一下。
它使用hri.isOffline()
构造。
(注意-Ruby使用\G
选项表示全部)
(更新-在不重新启动的情况下不要让它越过下一个块)
//m
https://rubular.com/r/TxlU9yhiUJkrok
解释
注意-此正则表达式一次匹配一个字符。
/(?:(?:(?<=\#BLOCK\n)|(?!^)\G))[,{}()"\s]*\K(?!\#BLOCK\b)[^,{}()"\s](?=.*\#BLOCK\b)/m
要匹配大块字符,请使用此字符。
(?:
(?<= \#BLOCK \n ) # A block behind
| # or,
(?! ^ ) # Not the BOS
\G # Start matching where last match left off
)
[,{}()"\s]* # Consume optional punctuation and whitespace
\K # Disregard anything matched so far
(?! \#BLOCK \b ) # Don't go past next block
[^,{}()"\s] # Get a single non-punct nor whitespace char
(?= .* \#BLOCK \b ) # Only if there is a block ahead
https://rubular.com/r/kyhqnOtIrmrnJ7
解释
/(?:(?<=\#BLOCK\n)|(?!^)\G)[,{}()"\s]*\K(?=.+\#BLOCK\b)(?:(?!\#BLOCK\b)[^,{}()"\s])+/m
答案 1 :(得分:2)
据我了解,您希望提取'#BLOCK'
行之间的单词,这些单词由字符串分隔,每个字符都是字符串"^ {}()\"\n#"
中的一个字符。我还将解决的另一种解释是,仅提取这些单词的字符。
问题的标题中需要使用正则表达式(形容词“ Rails”应删除,因为这毫无意义)。我建议不要针对此问题使用单个正则表达式。我认为下面提供的代码更直接,更易于遵循和测试,并且在将来需求发生变化时更易于维护。
代码
def exclude(str)
arr = str.split(/^#BLOCK$/).drop(1)
arr.pop unless str.end_with?('#BLOCK')
arr.flat_map { |s| s.scan(/[^ {}()"\n]+/) }
end
示例
str =<<END
cat
#BLOCK
#NAME PC8
#TYPE GHD3
#PROGRAM "FooBar" 2.0
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK
#DATE 20000101
#BASE 0473 "9873HGXR"
#PROGRAM "BarBaz" 3.0
#BLOCK
dog
END
extract str
#=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar",
# "2.0", "#DATE", "20190501", "#BASE", "3740", "TXGH3789"]
现在从str
组成一个字符串,并以'#BLOCK'
行结尾。
str1 = str.gsub(/^cat\n|^dog\n/, '')
puts str1
#BLOCK
#NAME PC8
#TYPE GHD3
#PROGRAM "FooBar" 2.0
#DATE 20190501
#BASE 3740 "TXGH3789"
#BLOCK
#DATE 20000101
#BASE 0473 "9873HGXR"
#PROGRAM "BarBaz" 3.0
#BLOCK
我们看到了
exclude(str1)
#=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar", "2.0",
# "#DATE", "20190501", "#BASE", "3740", "TXGH3789", "#DATE",
# "20000101", "#BASE", "0473", "9873HGXR", "#PROGRAM", "BarBaz", "3.0"]
返回与exclude(str)
相同的数组。
说明
对于上面定义的str
,步骤如下。
arr = str.split(/^#BLOCK$/)
#=> ["cat\n",
# "\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
# "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n",
# "\ndog\n"]
arr = arr.drop(1)
# ["\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
# "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n",
# "\ndog\n"]
str.end_with?('#BLOCK')
#=> false
arr.pop
#=> "\ndog\n"
arr
#=> ["\n#NAME PC8\n#TYPE GHD3\n...\"TXGH3789\"\n",
# "\n#DATE 20000101\n#BASE 0473...\"BarBaz\" 3.0\n"]
arr.flat_map { |s| s.scan(/[^ {}()"\n]+/) }
#=> ["#NAME", "PC8", "#TYPE", "GHD3", "#PROGRAM", "FooBar", "2.0",
# "#DATE", "20190501", "#BASE", "3740", "TXGH3789", "#DATE",
# "20000101", "#BASE", "0473", "9873HGXR", "#PROGRAM", "BarBaz", "3.0"]
问题的替代解释
如果只需要extract(str)
中单词的字符,则可以这样写:
extract(str).join
#=> "#NAMEPC8#TYPEGHD3#PROGRAMFooBar2.0#DATE20190501#BASE3740TXGH3789"
或
extract(str).join.chars
#=> ["#", "N", "A", "M", "E", "P",..., "z", "3", ".", "0"]
或在'+'
参数的正则表达式中删除scan
:
def exclude(str)
arr = str.split(/^#BLOCK$/).drop(1)
arr.pop unless str.end_with?('#BLOCK')
arr.flat_map { |s| s.scan(/[^ {}()"\n]/) }
end
exclude(str)
#=> ["#", "N", "A", "M", "E", "P",..., "z", "3", ".", "0"]