在Ruby中处理文件行

时间:2017-10-22 18:28:04

标签: ruby regex

我有一些像这样的文件

 file alldataset; append next;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;

我正在尝试编写一个ruby程序,将分号后的任何行推送到新行。另外,如果一行有'do',则从'do'缩进,以便后面的行用两个空格缩进,任何内部'do'用4个空格缩进,依此类推。

我是Ruby的新手,到目前为止,我的代码远非我想要的。这就是我所拥有的

 def indent(text, num)
   " "*num+" " + text
 end

 doc = File.open('newtext.txt')
 doc.to_a.each do |line|
 if line.downcase =~ /^(file).+(;)/i
   puts line+"\n"
 end
 if line.downcase.include?('do')
  puts indent(line, 2)
 end
end 

这是所需的输出

file alldataset;
  append next;
  if file.first? do 
    line + "\n";
    if !file.last? do
      line.indent(2);
    end;
  end;

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:1)

由于您对解析感兴趣,这是一个快速制作的示例,只是为了给您一个品味。我学过Lex / Yacc,Flex / Bison,ANTLR v3和ANTLR v4。我强烈推荐功能强大的ANTLR4。参考文献:

以下语法只能解析您提供的输入示例。

档案Question.g4

grammar Question;

/* Simple grammar example to parse the following code :

    file alldataset; append next; xyz;
    if file.first? do line + "\n";
    if !file.last? do line.indent(2);
    end;
    end;
    file file2; xyz;
*/

start
@init {System.out.println("Question last update 1048");}
    :   file* EOF
    ;

file
    :   FILE ID ';' statement_p*
    ;

statement_p
    :   statement
        {System.out.println("Statement found : " + $statement.text);}
    ;

statement
    :   'append' ID ';'
    |   if_statement
    |   other_statement
    |   'end' ';'
    ;

if_statement
    :   'if' expression 'do' expression ';'
    ;

other_statement
    :   ID ';'
    ;

expression
    :   receiver=( ID | FILE ) '.' method_call # Send
    |   expression '+' expression   # Addition
    |   '!' expression              # Negation
    |   atom                        # An_atom
    ;

method_call
    :   method_name=ID arguments?
    ;

arguments
    :   '(' ( argument ( ',' argument )* )? ')'
    ;

argument
    :   ID | NUMBER
    ;

atom
    :   ID
    |   FILE
    |   STRING
    ;

FILE   : 'file' ;
ID     : LETTER ( LETTER | DIGIT | '_' )* ( '?' | '!' )? ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
STRING : '"' .*? '"' ;

NL  : ( [\r\n] | '\r\n' ) -> skip ;

WS  : [ \t]+ -> channel(HIDDEN) ;

fragment DIGIT  : [0-9] ;
fragment LETTER : [a-zA-Z] ;

档案input.txt

 file alldataset; append next; xyz;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;
 file file2; xyz;

执行:

$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question start -tokens -diagnostics input.txt 
[@0,0:0=' ',<WS>,channel=1,1:0]
[@1,1:4='file',<'file'>,1:1]
[@2,5:5=' ',<WS>,channel=1,1:5]
[@3,6:15='alldataset',<ID>,1:6]
[@4,16:16=';',<';'>,1:16]
[@5,17:17=' ',<WS>,channel=1,1:17]
[@6,18:23='append',<'append'>,1:18]
[@7,24:24=' ',<WS>,channel=1,1:24]
[@8,25:28='next',<ID>,1:25]
[@9,29:29=';',<';'>,1:29]
[@10,30:30=' ',<WS>,channel=1,1:30]
[@11,31:33='xyz',<ID>,1:31]
[@12,34:34=';',<';'>,1:34]
[@13,36:36=' ',<WS>,channel=1,2:0]
[@14,37:38='if',<'if'>,2:1]
[@15,39:39=' ',<WS>,channel=1,2:3]
[@16,40:43='file',<'file'>,2:4]
[@17,44:44='.',<'.'>,2:8]
[@18,45:50='first?',<ID>,2:9]
[@19,51:51=' ',<WS>,channel=1,2:15]
[@20,52:53='do',<'do'>,2:16]
[@21,54:54=' ',<WS>,channel=1,2:18]
[@22,55:58='line',<ID>,2:19]
[@23,59:59=' ',<WS>,channel=1,2:23]
[@24,60:60='+',<'+'>,2:24]
[@25,61:61=' ',<WS>,channel=1,2:25]
[@26,62:65='"\n"',<STRING>,2:26]
[@27,66:66=';',<';'>,2:30]
...
[@59,133:132='<EOF>',<EOF>,7:0]
Question last update 1048
Statement found : append next;
Statement found : xyz;
Statement found : if file.first? do line + "\n";
Statement found : if !file.last? do line.indent(2);
Statement found : end;
Statement found : end;
Statement found : xyz;

ANTLR4优于先前版本或其他解析器生成器的一个优点是代码不再分散在解析器规则中,而是聚集在单独的侦听器中。这是您进行实际处理的地方,例如生成新的重新格式化文件。显示一个完整的例子太长了。今天你可以用C ++,C#,Python等编写监听器。由于我不了解Java,我有一个使用Jruby的机器,请参阅my forum answer

答案 1 :(得分:0)

在Ruby中有很多方法可以做。所以我的解决方案就是其中之一。

档案t.rb

def print_indented(p_file, p_indent, p_text)
    p_file.print p_indent
    p_file.puts  p_text
end

    # recursively split the line at semicolon, as long as the rest is not empty
def partition_on_semicolon(p_line, p_answer, p_level)
    puts "in partition_on_semicolon for level #{p_level} p_line=#{p_line} / p_answer=#{p_answer}"
    first_segment, semi, rest = p_line.partition(';')
    p_answer << first_segment + semi
    partition_on_semicolon(rest.lstrip, p_answer, p_level + 1) unless rest.empty?
end

lines = IO.readlines('input.txt')

# Compute initial indentation, the indentation of the first line.
# This is to preserve the spaces which are in the input.
m = lines.first.match(/^( *)(.*)/)
initial_indent = ' ' * m[1].length
# initial_indent = '' # uncomment if the initial indentation needs not to be preserved
puts "initial_indent=<#{initial_indent}> length=#{initial_indent.length}"
level       = 1
indentation = '  '

File.open('newtext.txt', 'w') do | output_file |
    lines.each do | line |
        line        = line.chomp
        line        = line.lstrip # remove trailing spaces
        puts "---<#{line}>"
        next_indent = initial_indent + indentation * (level - 1)

        case
        when line =~ /^file/ && line.count(';') > 1
            level = 1 # restore, remove this if files can be indented
            next_indent = initial_indent + indentation * (level - 1)
            # split in count fragments
            fragments = []
            partition_on_semicolon(line, fragments, 1)
            puts '---fragments :'
            puts fragments.join('/')
            print_indented(output_file, next_indent, fragments.first)

            fragments[1..-1].each do | fragment |
                print_indented(output_file, next_indent + indentation, fragment)
            end

            level += 1
        when line.include?(' do ')
            fragment1, _fdo, fragment2 = line.partition(' do ')
            print_indented(output_file, next_indent, "#{fragment1} do")
            print_indented(output_file, next_indent + indentation, fragment2)
            level += 1
        else
            level -= 1 if line =~ /end;/
            print_indented(output_file, next_indent, line)
        end
    end
end

档案input.txt

 file alldataset; append next; xyz;
 if file.first? do line + "\n";
 if !file.last? do line.indent(2);
 end;
 end;
 file file2; xyz;

执行:

$ ruby -w t.rb 
initial_indent=< > length=1
---<file alldataset; append next; xyz;>
in partition_on_semicolon for level 1 p_line=file alldataset; append next; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=append next; xyz; / p_answer=["file alldataset;"]
in partition_on_semicolon for level 3 p_line=xyz; / p_answer=["file alldataset;", "append next;"]
---fragments :
file alldataset;/append next;/xyz;
---<if file.first? do line + "\n";>
---<if !file.last? do line.indent(2);>
---<end;>
---<end;>
---<file file2; xyz;>
in partition_on_semicolon for level 1 p_line=file file2; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=xyz; / p_answer=["file file2;"]
---fragments :
file file2;/xyz;
---<>

输出文件newtext.txt

 file alldataset;
   append next;
   xyz;
   if file.first? do
     line + "\n";
     if !file.last? do
       line.indent(2);
       end;
     end;
 file file2;
   xyz;