Ruby在文本文件中找到单词并计算每个标题?

时间:2014-02-25 19:59:39

标签: ruby regex

我在单个文件中有以下字符串。这三个都在同一个文件中。它可能会升到HEAD-N。

从下面的字符串我想要一个像

这样的报告

HEAD-1 4未启动

为HEAD-2 2开始

对于HEAD-3 1开始,2未开始

HEAD-1
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     not started
ASDF2   192.168.1.1     not started
ASDF3   192.168.1.1     not started

HEAD-2
========
NE      Server
ASDF    192.168.1.1     started
ASDF1   192.168.1.1     started

HEAD-3
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     started
ASDF3   192.168.1.1     not started

我刚尝试使用Ruby中的RegExp,将所有HEAD放到一个数组中,然后将所有NE项放到另一个二维数组中。

(.*\n{1})(==*\s+)(.\s+)

这只匹配到NE服务器,我希望正则表达式匹配多行。

我可能错误的正则表达式方法,然后我必须尝试不同的方法。

提前致谢。

4 个答案:

答案 0 :(得分:1)

使用正则表达式,string包含整个字符串。正则表达式应该改进生产,例如,只搜索正确位置的开始/未开始而不是整个字符串(包括服务器名称等)。

status = {}
string.scan(/^(HEAD-\d+)(.*?)(?:\n\n|\Z)/m).each do |match|
  name, text = match
  started = text.scan(/(?<!not )started/).size
  not_started = text.scan(/not started/).size
  status[name] = {
    started: started,
    not_started: not_started
  }
end

status
# => {"HEAD-1"=>{:started=>0, :not_started=>4}, "HEAD-2"=>{:started=>2, :not_started=>0}, "HEAD-3"=>{:started=>1, :not_started=>2}}

答案 1 :(得分:0)

如果您可以假设输入的格式与您的示例类似(即每行一个服务器,“HEAD”标题在其自己的行上等),您可以使用gets获取输入一行在一次,然后只是将每一个匹配到像^(\w+) (\d+\.\d+\.\d+\.\d+) (.+)这样的正则表达式。对于此正则表达式,您只需检查最后一个组是否“未启动”。如果是这样,请在未启动的服务器中添加一个。如果没有,请在启动的服务器数量中添加一个。如果正则表达式不匹配,请检查它是否与^HEAD-(\d+)或类似的匹配。

答案 2 :(得分:0)

以下是我与CSV的不同尝试:

require 'csv' 

csv_string = <<_
HEAD-1
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     not started
ASDF2   192.168.1.1     not started
ASDF3   192.168.1.1     not started

HEAD-2
========
NE      Server
ASDF    192.168.1.1     started
ASDF1   192.168.1.1     started

HEAD-3
========
NE      Server
ASDF    192.168.1.1     not started
ASDF1   192.168.1.1     started
ASDF3   192.168.1.1     not started
_

options = {:col_sep => " " ,:skip_blanks => true ,:skip_lines => /[=]+/ }

csv_array = CSV.parse(csv_string,options)

csv_array.slice_before { |a| a.first[/head-\d+/i] }.to_a
# => [[["HEAD-1"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "not", "started"],
#      ["ASDF1", "192.168.1.1", "not", "started"],
#      ["ASDF2", "192.168.1.1", "not", "started"],
#      ["ASDF3", "192.168.1.1", "not", "started"]],
#     [["HEAD-2"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "started"],
#      ["ASDF1", "192.168.1.1", "started"]],
#     [["HEAD-3"],
#      ["NE", "Server"],
#      ["ASDF", "192.168.1.1", "not", "started"],
#      ["ASDF1", "192.168.1.1", "started"],
#      ["ASDF3", "192.168.1.1", "not", "started"]]]
report = csv_array.slice_before { |a| a.first[/head-\d+/i] }.map do|inner_ary|
  key,_ = inner_ary.shift(2)
  not_started,started = inner_ary.partition { |a| a.join(" ")[/\s+not\s+started$/] }
  key.push(["started #{started.size}","not started #{not_started.size}"])
end
Hash[report]
# => {"HEAD-1"=>["started 0", "not started 4"],
#     "HEAD-2"=>["started 2", "not started 0"],
#     "HEAD-3"=>["started 1", "not started 2"]}

答案 3 :(得分:0)

您可以尝试将问题分解为更小的部分。比如,不是使用复杂的正则表达式来匹配整个输出,而是可以将字符串拆分为单独的“HEAD”,然后遍历每个HEAD并计算子串“开始”或“未启动”的次数。这是一个未经测试的粗略的例子:

str = "<your large string here>"
heads = str.split(/HEAD-\d/)
heads.each_with_index do |current_head, i|
  started_count = current_head.scan(/\s\s+started/).length
  not_started_count = current_head.scan(/not started/).length
  puts "For HEAD #{i + 1}: #{started_count} started, #{not_started_count} not started"
end