Question

我需要清除短语"not"和主题标签（#）中的字符串。 （我还必须摆脱空格和大写字母并将它们返回数组，但我将后三个处理掉了。）

期望：

"not12345"       #=> ["12345"]
"   notabc  "    #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK"    #=> ["capslock"]
"##doublehash"   #=> ["doublehash"]
"h#a#s#h"        #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]

这是我的代码

def some_method(string)
    string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end

除了hash之外，所有上述测试都做了我需要做的事情。我不知道如何摆脱哈希;我尝试修改正则表达式部分：n.sub(/(#not)/)，n.sub(/#(not)/)，n.sub(/[#]*(not)/)无济于事。如何使Regex删除#？

Answer 1

arr = ["not12345", "   notabc", "notone, nottwo", "notCAPSLOCK",
       "##doublehash:", "h#a#s#h", "#notswaggerest"].

arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
  #=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]

当块变量str设置为"notone, nottwo"时，

s = str.downcase
  #=> "notone, nottwo" 
a = s.split(',')
  #=> ["notone", " nottwo"] 
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
  #=> ["one", "two"]

因为我使用了Enumerable#flat_map，所以"one"和"two"被添加到要返回的数组中。当str #=> "notCAPSLOCK"时，

s = str.downcase
  #=> "notcapslock" 
a = s.split(',')
  #=> ["notcapslock"] 
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
  #=> ["capslock"]

Answer 2

这是另一个解决方案，使用不同的技术捕捉你想要的东西，而不是丢弃你不想要的东西:(大部分）

a = ["not12345", "   notabc", "notone, nottwo", 
 "notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
     s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end 
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]

由于#而无法删除h#a#s#h，否则可以通过/(?<=not|^#[^not])\w+/

这样的正则表达式来避免删除

Answer 3

您可以使用此正则表达式来解决您的问题。我测试了它适用于所有测试用例。

/^\s*#*(not)*/

^表示匹配字符串开头
\s*匹配开头的任何空格
#*匹配0或更多＃
(not)*匹配短语＆＃34; not＆＃34;零次或多次。

注意：这个正则表达式不适用于＆＃34;不是＆＃34;来自＆＃34;＃＆＃34;，例如not#hash会返回#hash

Answer 4

Ruby正则表达式allow comments，所以为了匹配octothorpe（#）你可以逃脱它：

"#foo".sub(/\#/, "") #=> "foo"

Answer 5

有趣的问题，因为它可以使用Ruby中最常见的字符串函数：

result = values.map do |string|
 string.strip      # Remove spaces in front and back.
   .tr('#','')     # Transform single characters. In this case remove #
   .gsub('not','') # Substitute patterns
   .split(', ')    # Split into arrays.
end

p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]

我更喜欢这种方式而不是正则表达式，因为很容易理解每一行的逻辑。

从字符串中删除字符串模式和符号

5 个答案: