解析文件内容并显示树视图

时间:2014-05-27 01:50:59

标签: python perl bash awk nawk

给定一个包含内容的文件:

insert_job: J1
insert_job: J2
box_name: J1
insert_job: J3
box_name: J2
insert_job: J4
box_name: J1
insert_job: J5
box_name: J4
insert_job: J6
box_name: J4

我想将其显示如下(使用标签来识别父母与子女的关系):

J1
    J2
        J3
    J4
        J5
        J6
test_data2 for Borodin:
------------------------------
insert_job: JS11-LR_BaselIII
insert_job: JS11-Check_Batch_Run_Numbers
box_name: JS11-LR_BaselIII
insert_job: 11000000-start
box_name: JS11-Check_Batch_Run_Numbers
insert_job: 11000000-runbox
box_name: JS11-Check_Batch_Run_Numbers
insert_job: JS11-Load_Session_Date
box_name: JS11-LR_BaselIII
insert_job: JS110000-start
box_name: JS11-Load_Session_Date
insert_job: JS110000-runbox
box_name: JS11-Load_Session_Date
insert_job: JS11-Start_RiskWatch
box_name: JS11-LR_BaselIII
insert_job: JS110004-start
box_name: JS11-Start_RiskWatch
insert_job: JS110004-runbox
box_name: JS11-Start_RiskWatch
insert_job: JS11-Start_UDS
box_name: JS11-LR_BaselIII
insert_job: JS110001-start
box_name: JS11-Start_UDS
insert_job: JS110001-runbox
box_name: JS11-Start_UDS
insert_job: JS11-Pool_Processing
box_name: JS11-LR_BaselIII
insert_job: JS110002-start
box_name: JS11-Pool_Processing

Ed的解决方案中的

语法错误:

sdpvvrsp810{alelai}: gawk -f tst.awk testjobs3
gawk: tst.awk:2: /^box_name/   { box = $2; jobs[box][job] }
gawk: tst.awk:2:                                    ^ syntax error
gawk: tst.awk:9:         for (job in jobs[box])
gawk: tst.awk:9:                         ^ syntax error

3 个答案:

答案 0 :(得分:1)

这是一个更短的perl版本,可以处理您的示例数据。

sub parse {
  local $/ = undef;
  my $text = <>;
  my ($root) = $text =~ /insert_job:\s*(\S+)/;
  my @links = $text =~ /insert_job:\s*(\S+)\s*box_name:\s*(\S+)/g;
  my $children = {}; 
  while (@links) {
    my $child = shift @links;
    my $parent = shift @links;
    push @{$children->{$parent}}, $child;
  }
  my $print = sub {
    my ($print, $parent, $indent) = @_;
    print "\t" x $indent, $parent, "\n";
    $print->($print, $_, $indent + 1) foreach (@{$children->{$parent} || []});
  };
  $print->($print, $root, 0);
}

parse;

答案 1 :(得分:1)

这个程序可以满足您的要求。它期望输入文件的路径作为命令行上的参数。

首先构建一个哈希,将每个作业的名称与该框中的所有作业相关联。在下一行中未跟随框名称的作业将被推送到根作业列表中。最后,调用递归子例程print_tree以从每个根开始转储依赖树。

use strict;
use warnings;

my ($curr_job, %jobs, @roots);
while (<>) {
  next unless my ($op, $id) = /(\w+): ([\w-]+)/;
  if ($op eq 'insert_job') {
    push @roots, $curr_job if $curr_job;
    $curr_job = $id;
    $jobs{$id} = [] unless $jobs{$id};
  }
  elsif ($op eq 'box_name') {
    push @{ $jobs{$id} }, $curr_job;
    $curr_job = undef;
  }
}
push @roots, $curr_job if $curr_job;

print_tree($_) for @roots;

sub print_tree {
  my ($root, $indent) = (@_, 0);
  printf "%s%s\n", ' ' x 4 x $indent, $root;
  print_tree($_, $indent + 1) for @{ $jobs{$root} };
}

<强>输出

J1
    J2
        J3
    J4
        J5
        J6

输出2

JS11-LR_BaselIII
    JS11-Check_Batch_Run_Numbers
        11000000-runbox
        11000000-start
    JS11-Load_Session_Date
        JS110000-runbox
        JS110000-start
    JS11-Pool_Processing
        JS110002-start
    JS11-Start_RiskWatch
        JS110004-runbox
        JS110004-start
    JS11-Start_UDS
        JS110001-runbox
        JS110001-start

答案 2 :(得分:0)

将GNU awk用于真正的多维数组:

$ cat tst.awk
/^insert_job/ { job = $2; if (root == "") root = job }
/^box_name/   { box = $2; jobs[box][job] }
END           { prtBox(root) }

function prtBox(box,    job) {
    printf "%*s%s\n", indent, "", box
    indent += 2
    if (box in jobs)
        for (job in jobs[box])
            prtBox(job)
    indent -= 2
}

$ awk -f tst.awk file
J1
  J2
    J3
  J4
    J5
    J6