用regex替换line_2的每一个'第2行'

时间:2010-08-24 20:51:00

标签: ruby regex

我正在解析XML文件中的一些文本,这些文件的句子类似于 “从第1行减去第4行”,“从第5行输入金额” 我想用line_替换所有出现的行 例如。从第1行减去第4行 - >从line_1中减去line_4

此外,还有句话“第4行和第8行的数量是否相同?”和“跳过第9到12行;转到第13行。” 我想处理这些句子 “line_4和line_8上的金额是否相同?” 和 “跳过line_9到line_12;转到第line_13行。”

3 个答案:

答案 0 :(得分:2)

这是一个使用rspec测试的工作实现。你这样称呼它:output = LineIdentifier[input]。在安装rspec gem之后测试spec file.rb

require 'spec'

class LineIdentifier
  def self.[](input)
    output = input.gsub /line (\d+)/, 'line_\1'
    output.gsub /lines (\d+) (and|from|through) (line )?(\d+)/, 'line_\1 \2 line_\4'
  end
end

describe "LineIdentifier" do
  it "should identify line mentions" do
    examples = { 
      #Input                                         Output
     'Subtract line 4 from line 1.'               => 'Subtract line_4 from line_1.',
     'Enter the amount from line 5'               => 'Enter the amount from line_5',
     'Subtract line 4 from line 1'                => 'Subtract line_4 from line_1',
    }
    examples.each do |input, output|
      LineIdentifier[input].should == output
    end
  end
  it "should identify line ranges" do
    examples = { 
      #Input                                         Output
     'Are the amounts on lines 4 and 8 the same?' => 'Are the amounts on line_4 and line_8 the same?',
     'Skip lines 9 through 12; go to line 13.'    => 'Skip line_9 through line_12; go to line_13.',
    }
    examples.each do |input, output|
      LineIdentifier[input].should == output
    end
  end
end

答案 1 :(得分:0)

这适用于特定示例,包括OP注释中的示例。正如使用正则表达式进行解析时经常出现的情况一样,它会成为处理不断增加的已知输入的其他案例和测试的大杂烩。这使用带有非贪婪匹配的while循环处理行号列表。如上所述,它只是逐行处理输入。要获得跨越行边界的一系列行号,需要将其更改为将其作为一个块进行处理,并且跨行匹配。

open( ARGV[0], "r" ) do |file|
  while ( line = file.gets )
    # replace both "line ddd" and "lines ddd" with line_ddd 
    line.gsub!( /(lines?\s)(\d+)/, 'line_\2' )
    # Now replace the known sequences with a non-greedy match
    while line.gsub!( /(line_\d+[a-z]?,?)(\sand\s|\sthrough\s|,\s)(\d+)/, '\1\2line_\3' )
    end
    puts line
  end
end

示例数据:对于此输入:

Subtract line 4 from line 1.
Enter the amount from line 5
on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
... on line 10 Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4

它产生这个输出:

Subtract line_4 from line_1.
Enter the amount from line_5
on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
... on line_10 Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4

答案 2 :(得分:0)

sed是你的朋友:

lines.sed

#!/bin/sed -rf
s/lines? ([0-9]+)/line_\1/g
s/\b([0-9]+[a-z]?)\b/line_\1/g

lines.txt

Subtract line 4 from line 1.
Enter the amount from line 5
Are the amounts on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
Enter the total of the amounts from Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4

演示:

$ cat lines.txt | ./lines.sed
Subtract line_4 from line_1.
Enter the amount from line_5
Are the amounts on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
Enter the total of the amounts from Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4

如果你愿意的话,你也可以把它变成sed单行,虽然文件更易于维护。