获取字符串的特定部分

时间:2017-01-12 07:01:43

标签: ruby-on-rails ruby regex ruby-on-rails-5

给出(相对较长的)字符串:

string = "Checks for load balancers with listeners that do not use recommended security configurations for encrypted communication. AWS recommends using a secure protocol (HTTPS or SSL), up-to-date security policies, and ciphers and protocols that are secure.<br/>\nWhen you use a secure protocol for a front-end connection (client to load balancer), the requests are encrypted between your clients and the load balancer, which is more secure.<br/>\nElastic Load Balancing provides predefined security policies with ciphers and protocols that adhere to AWS security best practices. New versions of predefined policies are released as new configurations become available. <br/><br/>\n<b>Alert Criteria</b><br/>\nYellow: A load balancer has no listener that uses a secure protocol (HTTPS or SSL). <br/>\nYellow: A load balancer listener uses an outdated predefined SSL security policy. <br/>\nYellow: A load balancer listener uses a cipher or protocol that is not recommended. <br/>\nRed: A load balancer listener uses an insecure cipher or protocol.<br/><br/>\n<b>Recommended Action</b>\n<ul><li>If the traffic to your load balancer must be secure, use either the HTTPS or the SSL protocol for the front-end connection.</li>\n<li>Upgrade your load balancer to the latest version of the predefined SSL security policy.</li> \n<li>Use only the recommended ciphers and protocols.</li> </ul>\nFor more information, see <a target=\"_blank\" href=\"https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html\">Listener Configurations for Elastic Load Balancing</a>.<br/><br/>\n<b>Additional Resources</b><br/>\n<a target=\"_blank\" href=\"https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/using-elb-listenerconfig-quickref.html\">Listener Configurations Quick Reference</a><br/>\n<a target=\"_blank\" href=\"https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/ssl-config-update.html\">Update SSL Negotiation Configuration of Your Load Balancer</a><br/>\n<a target=\"_blank\" href=\"https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-ssl-security-policy.html\">SSL Negotiation Configurations for Elastic Load Balancing</a><br/>\n<a target=\"_blank\" href=\"https://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-security-policy-table.html\">SSL Security Policy Table</a><br/>\n"

我想有一个方法,我将其中一个状态作为参数传递:

  • 'Green'
  • 'Yellow'
  • 'Red'

这将返回一个完整的sencentes数组,跟随这个字符串(无论是多少次出现的字符串)。

def status_description(string, status)
  # manipulate string and return status description(s)
end

使用上面的字符串,我希望

status_description(string, 'Yellow')

返回

[
  'A load balancer has no listener that uses a secure protocol (HTTPS or SSL).',
  'A load balancer listener uses an outdated predefined SSL security policy.',
  'A load balancer listener uses a cipher or protocol that is not recommended.'
]

status_description(string, 'Red')

返回

['A load balancer listener uses an insecure cipher or protocol.']

字符串将始终具有相同的结构,这意味着状态描述始终遵循此部分:

\n<b>Alert Criteria</b><br/>

如果你可以让方法返回一个包含所有状态的哈希值(通常是前面提到的三个中的一部分或全部),那么它的'描述就完美了!类似的东西:

{
  'Green' => ['some green desc']
  'Yellow' => ['some yellow desc', 'another yellow desc'],
  'Red' => ['some red desc']
}

我还需要获取'Recommended Action'的数组:

[
  'If the traffic to your load balancer must be secure, use either the HTTPS or the SSL protocol for the front-end connection.',
  'Upgrade your load balancer to the latest version of the predefined SSL security policy.',
  'Use only the recommended ciphers and protocols.'
]

我对正则表达式几乎没有经验,在这种情况下可能不是那么简单。

非常感谢你的帮助!

4 个答案:

答案 0 :(得分:7)

def status_description(str, color)
  str.scan(/(?<=#{color}:\s).*?[.!?]/i)
end

status_description(string, "yellow")
  #=> ["A load balancer has no listener that uses a secure protocol (HTTPS or SSL).",
  #    "A load balancer listener uses an outdated predefined SSL security policy.",
  #    "A load balancer listener uses a cipher or protocol that is not recommended."]

status_description(string, "green")
  #=> [] 

status_description(string, "red")
  #=> ["A load balancer listener uses an insecure cipher or protocol."] 

有关

color = "yellow"

正则表达式

r = /
    (?<=       # begin a positive lookbehind
      #{color} # match the value of the variable `color`
      :\s      # match a colon followed by whitespace
    )          # close positive lookbehind
    .*?        # match any number of any characters, lazily
    [.!?]      # match a character that terminates a sentence
    /ix        # case-indifference and free-spacing regex definition modes
  #=> /
  #=> (?<=     # begin a positive lookbehind
  #     yellow # match the value of the variable `color`
  #     :\s    # match a colon followed by whitespace
  #   )        # close positive lookbehind
  #   .*?      # match any number of any characters, lazily
  #   [.!?]    # match a character that terminates a sentence
  #   /ix 

或者,根据要求,

["green", "yellow", "red"].each_with_object({}) { |c,h|
  h[c] = status_description(string, c) }
  #=> {"green" =>[],
  #    "yellow"=>[
  #      "A load balancer has no listener that uses a secure protocol (HTTPS or SSL).",
  #      "A load balancer listener uses an outdated predefined SSL security policy.",
  #      "A load balancer listener uses a cipher or protocol that is not recommended."
  #    ],
  #    "red"=>["A load balancer listener uses an insecure cipher or protocol."]
  #   } 

您可以执行以下操作来提取包含“建议操作”的句子。 1

r0 = /
     \n<b>Recommended\sAction<\/b>\n<ul><li> # match string
     \K                   # discard everything matched so far
     .+?                  # match any number of any character, lazily (?)
     (?=<\/li>\s<\/ul>)   # match string
     /mx                  # multiline and free-spacing regex definition modes

r1 = /<\/li>\s*\n\s*<li>/ # match string

string[r0].split(r1)
  #=> ["If the traffic to your load balancer must be secure, use either the \  
  #     HTTPS or the SSL protocol for the front-end connection.",
  #    "Upgrade your load balancer to the latest version of the predefined \
  #     SSL security policy.",
  #    "Use only the recommended ciphers and protocols."] 

请注意

string[r0]
  #=> "If the traffic to your load balancer must be secure, use either \
  #    the HTTPS or the SSL protocol for the front-end connection.\
  #    </li>\n<li>Upgrade your load balancer to the latest version of the \
  #    predefined SSL security policy.</li> \n<li>Use only the recommended \
  #    ciphers and protocols." 

1。在构建r0时,我将"Recommended Action""(?=<\/li> <\/ul>)"中的单个空格替换为\s。只有在以自由间隔模式(/x)定义正则表达式时才需要这样做,它忽略空格。此外,\n<b>Recommended\sAction<\/b>\n<ul><li>\K可以替换为正面的后视:(?<=\n<b>Recommended\sAction<\/b>\n<ul><li>)。最后,我格式化了返回字符串,因此无需水平滚动即可读取

答案 1 :(得分:1)

你可以试试这个:

[^.]*Yellow:([^.]*).

使用方法中的变量更改黄色。 第1组将返回您想要的描述。

Explanation

示例Ruby代码:

re = /[^.]*Yellow:([^.]*)./m

str = 'your large string goes here ................'
str.scan(re) do |match|
    puts match.to_s
end

Run it here

输出:

[" A load balancer has no listener that uses a secure protocol (HTTPS or SSL)"]
[" A load balancer listener uses an outdated predefined SSL security policy"]
[" A load balancer listener uses a cipher or protocol that is not recommended"]

答案 2 :(得分:1)

试试这些

颜色的正则表达式:

Yellow:\s*.*?\.
Red:\s*.*?\.
Green:\s*.*?\.

演示:https://regex101.com/r/IxoPB0/2

给定字符串中的推荐操作的正则表达式。

(?<=Recommended Action)<.*?li>(.*?)<\/li>.*?li>(.*?)<\/li>.*?li>(.*?)<\/li>

演示:https://regex101.com/r/IxoPB0/3

答案 3 :(得分:1)

由于此文本实际上是HTML,因此解析器(例如Nokogiri)比Regex更好。唯一的问题是这个HTML结构不是树,所以它使得解析起来有点困难。

String#split在这个例子中走了很长的路,完全不需要任何正则表达式。

代码首先将大字符串拆分为多个文本块(&#34;警报标准&#34;,&#34;推荐操作&#34;,...)。

对于&#34;警报标准&#34;阻止,它会在:周围分割每一行以获取颜色和文本,并创建一个数组哈希。

对于&#34;推荐操作&#34;,它只查找<li></li>之间的文字。

blocks = string.split("<br/><br/>\n").map do |block|
  block.split('<br/>').map(&:strip)
end

### Analyzing criterias
criterias_block = blocks.find { |block| block.first.include?('Alert Criteria') }

criterias_hash = Hash.new { |h, k| h[k] = [] }

if criterias_block
  header, *criterias = criterias_block
  criterias.each_with_object(criterias_hash) do |line, hash|
    color, criteria = line.split(': ')
    hash[color] << criteria
  end
end

pp criterias_hash
# {"Yellow"=>
#   ["A load balancer has no listener that uses a secure protocol (HTTPS or SSL).",
#    "A load balancer listener uses an outdated predefined SSL security policy.",
#    "A load balancer listener uses a cipher or protocol that is not recommended."],
#  "Red"=>["A load balancer listener uses an insecure cipher or protocol."]}

### Recommend actions
actions_block = blocks.find { |block| block.first.include?('Recommended Action') }

if actions_block
  require 'nokogiri'
  actions_html = Nokogiri::HTML(actions_block.first)
  pp actions_html.css('li').map(&:text)
end

# ["If the traffic to your load balancer must be secure, use either the HTTPS or the SSL protocol for the front-end connection.",
#  "Upgrade your load balancer to the latest version of the predefined SSL security policy.",
#  "Use only the recommended ciphers and protocols."]