Unix连接值的嵌套/缩进列表吗?

时间:2019-03-18 17:00:15

标签: bash unix awk sed

我正在尝试从带有缩进的键值对列表(2个空格)中生成层次结构列表。 修改后的原始内容

编辑:道歉。我最终粘贴了错误的输出。原始的yaml文件具有这种格式。获得“描述”是我的次要目标:

schemas:
- name: exports
  tables:
  - name: sugar
    description: makes stuff sweet
    active_date: 2019-01-07 00:00:00
    columns:
    - name: color
      type: abcd
    - name: taste
      type: abcd
      description: xyz
      example: 21352352
    - name: structure
      type: abcd
      description: xyzasaa
      example: 10001
  - name: salt
    description: not that sweet.
      makes it salty.
    active_date: 2018-12-18 00:00:00
    columns:
    - name: strength
      type: abcdef
      description: easy to find
      example: 2018-03-03 12:30:00
    - name: color
      type: abcdeffa
      description: not sweet
      example: 21352352
    - name: quality
      type: abcd
      description: how much is needed
      example: 10001

最好的输出将在下面,在这里我试图生成一个csv并将yaml展平,每一行都包含带有所有父值的子元素最多的子元素:

sugar.color,abcd
sugar.taste,abcd,xyz
sugar.structure,abcd,xyzasaa
salt.strength,abcdef,"easy to find"
salt.color,abcdeffa,"not sweet"
salt.quality,abcd,"how much is needed"

但是我不知道上面的方法是否可行,因此至少要寻找:

sugar.color
sugar.taste
sugar.structure
salt.strength
salt.color
salt.quality

2 个答案:

答案 0 :(得分:2)

在任何UNIX盒上的任何shell中使用任何awk:

$ cat tst.awk
BEGIN { OFS = "," }

match($0,/^ +- /) { indent = RLENGTH }

$1 == "-" {
    prt()
    if (indent == 4) {
        key = $NF
        subKey = ""
    }
    else if (indent == 6) {
        subKey = $NF
    }
    next
}

subKey != "" {
    data = substr($0,indent+1)

    if ( data ~ /^[^[:space:]]/ ) {
        # new data
        tag = data
        sub(/:.*/,"",tag)
        sub(/^[^:]+: */,"",data)
        f[tag] = data
    }
    else {
        # continuation of previous data
        sub(/^[[:space:]]*/,"",data)
        f[tag] = f[tag] " " data
    }
}

END { prt() }

function prt() {
    if ( "type" in f ) {
        print key "." subKey, f["type"], "\"" f["description"] "\""
    }
    delete f
}

$ awk -f tst.awk file
sugar.color,abcd,""
sugar.taste,abcd,"xyz"
sugar.structure,abcd,"xyzasaa"
salt.strength,abcdef,"easy to find"
salt.color,abcdeffa,"not sweet"
salt.quality,abcd,"how much is needed"

如果任何description是多行的,则上述内容会将其串联到一行。

答案 1 :(得分:1)

这是一个Perl脚本,可产生所需的输出。

app.cc

示例:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw/postderef/;
no warnings qw/experimental::postderef/; # Suppress warning on 5.20 and 5.22
use YAML::XS qw/LoadFile/;
use Text::CSV_XS;

my $yaml = LoadFile($ARGV[0]);
my $csv = Text::CSV_XS->new({quote_space => 1, eol => "\n"});

for my $schema ($yaml->{'schemas'}->@*) {
    for my $table ($schema->{'tables'}->@*) {
        for my $col ($table->{'columns'}->@*) {
            my @row = ("$table->{name}.$col->{name}", $col->{type});
            push @row, $col->{'description'} if exists $col->{'description'};
            $csv->print(\*STDOUT, \@row);
        }
    }
}

需要几个非标准模块:YAML::XS(Debian / Ubuntu软件包$ perl example.pl test.yaml sugar.color,abcd sugar.taste,abcd,xyz sugar.structure,abcd,xyzasaa salt.strength,abcdef,"easy to find" salt.color,abcdeffa,"not sweet" salt.quality,abcd,"how much is needed" )和Text::CSV_XS(Debian / Ubuntu libyaml-libyaml-perl


YAML是一种结构化的数据标记格式,尝试使用正则表达式等一次用一行来处理事情会使自己陷入失败;任何与您期望的不同的输入将导致其严重失败;而且,由于总体上缺乏上下文和大量边缘案例,因此无法做到稳健。尝试使用正则表达式解析CSV,XML / HTML或JSON之类的内容相同。

最好使用了解格式的工具或库。因此,以上代码使用YAML解析器将文件转换为等效的perl数据结构,然后执行该操作,打印出相关值。它使用CSV库来格式化输出,从而避免像需要的输出那样手动处理带空格的引号字段,以及所有其他边缘情况(如引号)。