我正在尝试从带有缩进的键值对列表(2个空格)中生成层次结构列表。 修改后的原始内容
编辑:道歉。我最终粘贴了错误的输出。原始的yaml文件具有这种格式。获得“描述”是我的次要目标:
schemas:
- name: exports
tables:
- name: sugar
description: makes stuff sweet
active_date: 2019-01-07 00:00:00
columns:
- name: color
type: abcd
- name: taste
type: abcd
description: xyz
example: 21352352
- name: structure
type: abcd
description: xyzasaa
example: 10001
- name: salt
description: not that sweet.
makes it salty.
active_date: 2018-12-18 00:00:00
columns:
- name: strength
type: abcdef
description: easy to find
example: 2018-03-03 12:30:00
- name: color
type: abcdeffa
description: not sweet
example: 21352352
- name: quality
type: abcd
description: how much is needed
example: 10001
最好的输出将在下面,在这里我试图生成一个csv并将yaml展平,每一行都包含带有所有父值的子元素最多的子元素:
sugar.color,abcd
sugar.taste,abcd,xyz
sugar.structure,abcd,xyzasaa
salt.strength,abcdef,"easy to find"
salt.color,abcdeffa,"not sweet"
salt.quality,abcd,"how much is needed"
但是我不知道上面的方法是否可行,因此至少要寻找:
sugar.color
sugar.taste
sugar.structure
salt.strength
salt.color
salt.quality
答案 0 :(得分:2)
在任何UNIX盒上的任何shell中使用任何awk:
$ cat tst.awk
BEGIN { OFS = "," }
match($0,/^ +- /) { indent = RLENGTH }
$1 == "-" {
prt()
if (indent == 4) {
key = $NF
subKey = ""
}
else if (indent == 6) {
subKey = $NF
}
next
}
subKey != "" {
data = substr($0,indent+1)
if ( data ~ /^[^[:space:]]/ ) {
# new data
tag = data
sub(/:.*/,"",tag)
sub(/^[^:]+: */,"",data)
f[tag] = data
}
else {
# continuation of previous data
sub(/^[[:space:]]*/,"",data)
f[tag] = f[tag] " " data
}
}
END { prt() }
function prt() {
if ( "type" in f ) {
print key "." subKey, f["type"], "\"" f["description"] "\""
}
delete f
}
$ awk -f tst.awk file
sugar.color,abcd,""
sugar.taste,abcd,"xyz"
sugar.structure,abcd,"xyzasaa"
salt.strength,abcdef,"easy to find"
salt.color,abcdeffa,"not sweet"
salt.quality,abcd,"how much is needed"
如果任何description
是多行的,则上述内容会将其串联到一行。
答案 1 :(得分:1)
这是一个Perl脚本,可产生所需的输出。
app.cc
示例:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/postderef/;
no warnings qw/experimental::postderef/; # Suppress warning on 5.20 and 5.22
use YAML::XS qw/LoadFile/;
use Text::CSV_XS;
my $yaml = LoadFile($ARGV[0]);
my $csv = Text::CSV_XS->new({quote_space => 1, eol => "\n"});
for my $schema ($yaml->{'schemas'}->@*) {
for my $table ($schema->{'tables'}->@*) {
for my $col ($table->{'columns'}->@*) {
my @row = ("$table->{name}.$col->{name}", $col->{type});
push @row, $col->{'description'} if exists $col->{'description'};
$csv->print(\*STDOUT, \@row);
}
}
}
需要几个非标准模块:YAML::XS(Debian / Ubuntu软件包$ perl example.pl test.yaml
sugar.color,abcd
sugar.taste,abcd,xyz
sugar.structure,abcd,xyzasaa
salt.strength,abcdef,"easy to find"
salt.color,abcdeffa,"not sweet"
salt.quality,abcd,"how much is needed"
)和Text::CSV_XS(Debian / Ubuntu libyaml-libyaml-perl
)
YAML是一种结构化的数据标记格式,尝试使用正则表达式等一次用一行来处理事情会使自己陷入失败;任何与您期望的不同的输入将导致其严重失败;而且,由于总体上缺乏上下文和大量边缘案例,因此无法做到稳健。尝试使用正则表达式解析CSV,XML / HTML或JSON之类的内容相同。
最好使用了解格式的工具或库。因此,以上代码使用YAML解析器将文件转换为等效的perl数据结构,然后执行该操作,打印出相关值。它使用CSV库来格式化输出,从而避免像需要的输出那样手动处理带空格的引号字段,以及所有其他边缘情况(如引号)。