仅打印文件中的特定匹配行

时间:2016-07-18 11:00:40

标签: ruby regex

我在一个文件中输入了。我的意见是,

Index
chapter 1
    Introduction to ruby
    ruby basics
        Installing ruby
        executing ruby
chapter 2
    Ruby class
    Ruby object
    Ruby method
        Defining method
        Calling method
chapter 3
    Ruby variable
       Local variable
       Class variable
       Global variable
       Instance variable
chapter 4
    .
    .
    .

chapter 1chapter 234等是标题。我可能在每章中有n行作为章节。

我需要仔细阅读特定章节的细节。我需要它的所有部分。例如,如果我grep chapter 1,则输出为

chapter 1
    Introduction to ruby
    ruby basics
        Installing ruby
        executing ruby

如何遍历下一行,并检查它?请帮帮我。

File.open 'test.txt' do |file|
    chap_det=file.find { |line| line =~ /chapter 1:/ }
    puts chap_det
end

4 个答案:

答案 0 :(得分:5)

假设您已成功将内容读入input字符串:

input = File.read('test.txt')

chapter = ->(n) { /chapter\s+#{n}.*?(?=\R\w)/im }
#⇒ #<Proc:0x00000002b2d7f0@(pry):59 (lambda)>
input[chapter.(2)]
#⇒ "chapter 2\n    Ruby class\n (...skipped...)  Calling method"

此处的正则表达式匹配所有内容,从chapter N开始,以回车符/换行符(任何“换行符”)结尾,后跟“单词符号”。

puts input[chapter.(1)]
# Chapter 1
#     Introduction to ruby
#     ruby basics
#         Installing ruby
#         executing ruby

NB!以下评论中WiktorStribiżew提出的正则表达式有点快,因为它不涉及懒字点匹配

chapter = ->(n) { /chapter\s+#{n}\b.*(?:\R\B.*)*/i }

证明:

input = %|Index
Chapter 1
    Introduction to ruby
    ruby basics
        Installing ruby
        executing ruby
chapter 2
    Ruby class
    Ruby object
    Ruby method
        Defining method
        Calling method
chapter 3
    Ruby variable
       Local variable
       Class variable
       Global variable
       Instance variable
Chapter 4
    Introduction to ruby
    ruby basics
        Installing ruby
        executing ruby
chapter 5
    Ruby class
    Ruby object
    Ruby method
        Defining method
        Calling method
chapter 6
    Ruby variable
       Local variable
       Class variable
       Global variable
       Instance variable
|

ch1 = ->(n) { /chapter\s+#{n}.*?(?=\R\w)/im }
ch2 = ->(n) { /chapter\s+#{n}\b.*(?:\R\B.*)*/i }

require 'benchmark'

n = 500000
Benchmark.bm(7) do |x|
  x.report("1:") { n.times do input[ch1.(4)] end }
  x.report("2:") { n.times do input[ch2.(4)] end }
end

#⇒               user     system      total        real
#  1:        6.460000   0.000000   6.460000 (  6.460074)
#  2:        6.010000   0.000000   6.010000 (  6.010000)

答案 1 :(得分:1)

出于好奇:使用flip-flop operation的解决方案:

▶ N = 2
▶ File.readlines('text.txt').select do |line|
▷   true if line[/chapter #{N}/i]..line[/chapter #{N+1}/i]  
▷ end[0...-1].join $/  
#⇒ "chapter 2\n  (... skipped out ...)  Calling method"

比正则表达式解决方案慢约3倍。

答案 2 :(得分:0)

您还可以使用以下代码:

chapter_lines = []
start = false
chapter_number = 1
File.open("test.txt", "r").each_line do |line|
    start = true if line["chapter #{chapter_number}"]
    start = false if line["chapter #{chapter_number+1}"]
    chapter_lines << line.strip if start
end    

puts chapter_lines.join("\n")

编辑:请注意这假设所有对章节的引用都是&#34;章节&#34;而不是&#34;章&#34;。有问题的是&#39;章&#39;一次和&#39;章&#39;别处。小资本和资本的差异c。

希望有所帮助:)

答案 3 :(得分:0)

这是一个常见问题,Ruby的slice_beforeslice_after方法非常有用。使用slice_before

doc = <<EOT
Index
chapter 1
    Introduction to ruby
    ruby basics
        Installing ruby
        executing ruby
chapter 2
    Ruby class
    Ruby object
    Ruby method
        Defining method
        Calling method
chapter 3
    Ruby variable
      Local variable
      Class variable
      Global variable
      Instance variable
EOT

chapters = doc.lines.slice_before(/^chapter/).to_a
# => [["Index\n"], ["chapter 1\n", "    Introduction to ruby\n", "    ruby basics\n", "        Installing ruby\n", "        executing ruby\n"], ["chapter 2\n", "    Ruby class\n", "    Ruby object\n", "    Ruby method\n", "        Defining method\n", "        Calling method\n"], ["chapter 3\n", "    Ruby variable\n", "       Local variable\n", "       Class variable\n", "       Global variable\n", "       Instance variable\n"]]

chapters.shift

chapters[0] # => ["chapter 1\n", "    Introduction to ruby\n", "    ruby basics\n", "        Installing ruby\n", "        executing ruby\n"]

chapters.shift用于删除导致每章数组的第一个元素,按顺序编制索引。

从那里很容易恢复整个&#34;章节&#34;如果需要,可以使用join内容,但由于这些行已经是数组元素,因此您可能希望将它们保持原样:

chapters[0].join # => "chapter 1\n    Introduction to ruby\n    ruby basics\n        Installing ruby\n        executing ruby\n"

由于您正在从文件中读取文件,只要文件安全地放入内存,您就可以使用File.readlines('file_to_read')将其读取并将其转换为数组,然后您可以将其用于{ {1}}。