如何将YAML数据解析为自定义Bash数据数组/哈希结构?

时间:2019-08-11 00:11:30

标签: arrays bash yaml associative-array

我有以下YAML文件:

site:
  title: My blog
  domain: example.com
  author1:
    name: bob
    url: /author/bob
  author2:
    name: jane
    url: /author/jane
  header_links:
    about:
      title: About
      url: about.html
    contact:
      title: Contact Us
      url: contactus.html
  js_deps:
    - cashjs
    - jets

products:
  product1:
    name: Prod One
    price: 10
  product2:
    name: Prod Two
    price: 20

我想要一个Bash,Python或AWK函数或脚本,它们可以将上面的YAML文件作为输入($1,然后生成然后执行以下代码(或完全等效的东西):

unset site_title 
unset site_domain
unset site_author1
unset site_author2
unset site_header_links
unset site_header_links_about
unset site_header_links_contact
unset js_deps

site_title="My blog"
site_domain="example.com"

declare -A site_author1
declare -A site_author2

site_author1=(
  [name]="bob"
  [url]="/author/bob"
)

site_author2=(
  [name]="jane"
  [url]="/author/jane"
)

declare -A site_header_links_about
declare -A site_header_links_contact

site_header_links_about=(
  [name]="About"
  [url]="about.html"
)

site_header_links_contact=(
  [name]="Contact Us"
  [url]="contact.html"
)

site_header_links=(site_header_links_about  site_header_links_contact)

js_deps=(cashjs jets)

unset products
unset product1
unset product2

declare -A product1
declare -A product2

product1=(
  [name]="Prod One"
  [price]=10
)

product2=(
  [name]="Prod Two"
  [price]=20
)

products=(product1 product2)

因此,逻辑是:

遍历YAML,并在最后一个(底部)级别创建带字符串值的下划线连接变量名称, except ,在该级别应尽可能将数据创建为关联数组或索引数组。 ..同样,任何创建的assoc数组都应在索引数组中按名称列出。

因此,换句话说:

  • 只要可以将最后一级的数据转换为关联数组,则它应该是(foo.bar.hash => ${foo_bar_hash[@]}

  • 只要可以将最后一级的数据转换成索引数组,它就应该是(foo.bar.list => ${foo_bar_list[@]}

  • 每个assoc数组都应在索引数组中按名称列出,该索引数组在yaml数据中以其父项命名(请参见示例中的products

  • 否则,只需在下划线处连接一个var名称,然后将值保存为字符串(foo.bar.string => ${foo_bar_string}

...之所以需要这种特定的Bash数据结构,是因为我使用的是基于Bash的模板系统。

一旦有了所需的功能,我就可以在模板中轻松使用YAML数据,如下所示:

{{site_title}}

...

{{#foreach link in site_header_links}}
  <a href="{{link.url}}">{{link.name}}</a>
{{/foreach}}

...

{{#js_deps}}
  {{.}}
{{/js_deps}}

...

{{#foreach item in products}}
  {{item.name}}
  {{item.price}}
{{/foreach}}

我尝试过的事情:

这与我之前问过的问题完全相关:

这是如此接近,但是我还需要生成site_header_links的关联数组才能成功生成 ..它失败了,因为site_header_links嵌套太深了。

我仍然很乐意在解决方案中使用https://github.com/azohra/yaml.sh,因为它也可以为模板系统提供简单的把手样式lookup剥夺技巧:)

编辑:

要非常清楚:解决方案不能使用pipvirtualenv或需要单独安装的任何其他外部dep-它必须是独立的脚本/可以驻留在CMS项目目录中的func(例如https://github.com/azohra/yaml.sh)...否则我就不需要在这里。

...

希望,一个得到很好评论的答案可以帮助我避免回到这里;)

2 个答案:

答案 0 :(得分:0)

仅凭一眼就很难看出纸牌游戏的规则是什么 看着人们玩一轮。并且以类似的方式 很难确切了解YAML文件的“规则”是什么。

以下,我也对根级别进行了假设 作为第一,第二和第三级节点以及它们的输出 生成。对节点进行假设也是有效的 根据操作父母的水平,它会更加灵活(如您 然后只需添加例如根级别的序列),但这将 实施起来有些困难。

保留声明和复合数组分配点缀 其他代码并针对“相似”项目进行分组比较麻烦。 为此,您需要跟踪节点类型(str, dict,嵌套dict)并在其上进行分组。所以每个根级别的密钥我转储全部 unset首先,然后是所有声明,然后是所有赋值,然后是al 复合作业。我认为这属于“完全符合 等效”。

由于products-> product1 / product2被完全处理 与site-> author1 / authro2不同,它们具有相同的节点 结构,我做了一个单独的函数来处理每个根级别密钥。

要使其运行,您应该为Python(3.7 / 3.6)设置一个虚拟环境,安装 YAML库在其中:

$ python -m venv /opt/util/yaml2bash
$ /opt/util/yaml2bash/bin/pip install ruamel.yaml

然后存储以下程序,例如在/opt/util/yaml2bash/bin/yaml2bash中 并使其可执行(chmod +x /opt/util/yaml2bash/bin/yaml2bash

#! /opt/util/yaml2bash/bin/python

import sys
from pathlib import Path
import ruamel.yaml

if len(sys.argv) > 0:
    input = Path(sys.argv[1])
else:
    input = sys.stdin


def bash_site(k0, v0, fp):
    """this function takes a root-level key and its value (v0 a dict), constructs the 
    list of unsets and outputs based on the keys, values and type of values of v0,
    then dumps these to fp
    """
    unsets = []
    declares = []
    assignments = []
    compounds = {}
    for k1, v1 in v0.items():
        if isinstance(v1, str):
            k = k0 + '_' + k1
            unsets.append(k)
            assignments.append(f'{k}="{v1}"')
        elif isinstance(v1, dict):
            first_val = list(v1.values())[0]
            if isinstance(first_val, str):
                k = k0 + '_' + k1
                unsets.append(k)
                declares.append(k)
                assignments.append(f'{k}=(')
                for k2, v2 in v1.items():
                    q = '"' if isinstance(v2, str) else ''
                    assignments.append(f'  [{k2}]={q}{v2}{q}')
                assignments.append(')')
            elif isinstance(first_val, dict):
                for k2, v2 in v1.items(): # assume all the same type
                    k = k0 + '_' + k1 + '_' + k2   
                    unsets.append(k)
                    declares.append(k)
                    assignments.append(f'{k}=(')
                    for k3, v3 in v2.items():
                        q = '"' if isinstance(v3, str) else ''
                        assignments.append(f'  [{k2}]={q}{v3}{q}')
                    assignments.append(')')
                    compounds.setdefault(k0 + '_' + k1, []).append(k)
            else:
                raise NotImplementedError("unknown val: " + repr(first_val))
        elif isinstance(v1, list):
            unsets.append(k1)
            compounds[k1] = v1
        else:
            raise NotImplementedError("unknown val: " + repr(v1))


    if unsets:
        for item in unsets:
            print('unset', item, file=fp)
        print(file=fp)
    if declares:
        for item in declares:
            print('declare -A', item, file=fp)
        print(file=fp)
    if assignments:
        for item in assignments:
            print(item, file=fp)
        print(file=fp)
    if compounds:
        for k in compounds:
            v = ' '.join(compounds[k])
            print(f'{k}=({v})', file=fp)
        print(file=fp)


def bash_products(k0, v0, fp):
    """this function takes a root-level key and its value (v0 a dict), constructs the 
    list of unsets and outputs based on the keys, values and type of values of v0,
    then dumps these to fp
    """
    unsets = [k0]
    declares = []
    assignments = []
    compounds = {}
    for k1, v1 in v0.items():
        if isinstance(v1, dict):
            first_val = list(v1.values())[0]
            if isinstance(first_val, str):
                unsets.append(k1)
                declares.append(k1)
                assignments.append(f'{k1}=(')
                for k2, v2 in v1.items():
                    q = '"' if isinstance(v2, str) else ''
                    assignments.append(f'  [{k2}]={q}{v2}{q}')
                assignments.append(')')
                compounds.setdefault(k0, []).append(k1)
            else:
                raise NotImplementedError("unknown val: " + repr(first_val))
        else:
            raise NotImplementedError("unknown val: " + repr(v1))


    if unsets:
        for item in unsets:
            print('unset', item, file=fp)
        print(file=fp)
    if declares:
        for item in declares:
            print('declare -A', item, file=fp)
        print(file=fp)
    if assignments:
        for item in assignments:
            print(item, file=fp)
        print(file=fp)
    if compounds:
        for k in compounds:
            v = ' '.join(compounds[k])
            print(f'{k}=({v})', file=fp)
        print(file=fp)




yaml = ruamel.yaml.YAML()
data = yaml.load(input)

output = sys.stdout  # make it easier to redirect to file if necessary at some point in the future

bash_site('site', data['site'], output)
bash_products('products', data['products'], output)

如果您运行此程序并将YAML输入文件作为 参数(/opt/util/yaml2bash/bin/yaml2bash input.yaml)给出:

unset site_title
unset site_domain
unset site_author1
unset site_author2
unset site_header_links_about
unset site_header_links_contact
unset js_deps

declare -A site_author1
declare -A site_author2
declare -A site_header_links_about
declare -A site_header_links_contact

site_title="My blog"
site_domain="example.com"
site_author1=(
  [name]="bob"
  [url]="/author/bob"
)
site_author2=(
  [name]="jane"
  [url]="/author/jane"
)
site_header_links_about=(
  [about]="About"
  [about]="about.html"
)
site_header_links_contact=(
  [contact]="Contact Us"
  [contact]="contactus.html"
)

site_header_links=(site_header_links_about site_header_links_contact)
js_deps=(cashjs jets)

unset products
unset product1
unset product2

declare -A product1
declare -A product2

product1=(
  [name]="Prod One"
  [price]=10
)
product2=(
  [name]="Prod Two"
  [price]=20
)

products=(product1 product2)

您可以使用类似source $(/opt/util/yaml2bash/bin/yaml2bash input.yaml)的方法 在bash中获取所有这些值。

请注意,YAML文件中的全部双引号是多余的。

使用Python和ruamel.yaml(免责声明,我是该书的作者 软件包)为您提供完整的YAML解析器,例如允许您使用注释和流程样式 集合:

jsdeps: [cashjs, jets]    # more compact

如果您几乎要停产Python 2.7,并且无法完全控制计算机(在这种情况下,您应该为其安装/编译Python 3.7),则仍然可以使用ruamel yaml。 / p>

  1. 确定程序的运行位置,例如~/bin
  2. 创建~/bin/ruamel(根据1进行调整)。
  3. cd ~/bin/ruamel
  4. touch __init__.py
  5. 从PyPI下载latest tar file
  6. 解压缩tar文件并将结果目录从ruamel.yaml-X.Y.Z重命名为yaml

ruamel.yaml应该没有依赖关系地工作。在2.7上是ruamel.ordereddictruamel.yaml.clib,它们提供C版本的基本例程来加快速度。

上述程序需要重新编写一些内容(f字符串-> "".format()pathlib.Path->老式with open(...) as fp:

答案 1 :(得分:0)

我决定将以下各项组合使用:

  • Yay的被黑版本:

    • 增加了对简单列表的支持
    • 修复多个缩进级别
  • this yaml parser的被黑版本:

    • 具有从Yay借来的前缀内容,以保持一致性
override func viewDidLoad() {
    let width = self.bounds.width // This is the width of the superview, in your case probably the `UIViewController`
    let height = 70 // Your desired height, if you want it to full the superview, use self.bounds.height
    let layout = collectionView.collectionViewLayout as! UICollectionViewFlowLayout
    layout.itemSize = CGSize(width: width, height: self.bounds.height) // Sets the dimensions of your collection view cell. 
}

activities: Activity[]; notifications: any[] = []; this.profileService .listProfileActivities(this.authService.profileId) .subscribe({ next: activities => { this.activities = activities.filter( activity => activity.type === 'favorite' && activity.to === this.authService.profileId ); this.activities.forEach(activity => { // forEach loop is here const notification = { profile: null, profileId: '', imageId: '', name: '', timeago: new Date(), }; this.profileService .readProfile(activity.from) // 2nd subscribe method dependent on forEach loop variable .subscribe(profile => { notification.profile = profile; notification.profileId = profile.id; notification.imageId = profile.imageID; notification.name = profile.name; notification.timeago = new Date(activity.at); }); this.notifications.push(notification); }); }, }); 包含以下内容时,使用上面的代码

function yaml_to_vars {
   # find input file
   for f in "$1" "$1.yay" "$1.yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "$2" ]] && local prefix="$2" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix="${prefix//-/_}_";
   }

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|,$s\]$s\$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1  - \4|;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1  - \3|;p" $1 | \
   sed -ne "s|,$s}$s\$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1  \3: \4|;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1  \2|;p" | \
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
      if(length($2)== 0){  vname[indent]= ++idx[indent] };
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
      }
   }'
}

yay_parse() {

   # find input file
   for f in "$1" "$1.yay" "$1.yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "$2" ]] && local prefix="$2" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix=${prefix//-/_};
   }

   echo "unset $prefix; declare -g -a $prefix;"

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   #sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
   #       -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
   sed -ne "s|,$s\]$s\$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1  - \4|;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1  - \3|;p" $1 | \
   sed -ne "s|,$s}$s\$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1  \3: \4|;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1  \2|;p" | \
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
   awk -F$fs '{
      indent       = length($1)/2;
      key          = $2;
      value        = $3;

      # No prefix or parent for the top level (indent zero)
      root_prefix  = "'$prefix'_";
      if (indent == 0) {
        prefix = "";          parent_key = "'$prefix'";
      } else {
        prefix = root_prefix; parent_key = keys[indent-1];
      }

      keys[indent] = key;

      # remove keys left behind if prior row was indented more than this row
      for (i in keys) {if (i > indent) {delete keys[i]}}

      # if we have a value
      if (length(value) > 0) {

        # set values here

        # if the "key" is missing, make array indexed, not assoc..

        if (length(key) == 0) {
          # array item has no key, only a value..
          # so, if we didnt already unset the assoc array
          if (unsetArray == 0) {
            # unset the assoc array here
            printf("unset %s%s; ", prefix, parent_key);
            # switch the flag, so we only unset once, before adding values
            unsetArray = 1;
          }
          # array was unset, has no key, so add item using indexed array syntax
          printf("%s%s+=(\"%s\");\n", prefix, parent_key, value);

        } else {
          # array item has key and value, add item using assoc array syntax
          printf("%s%s[%s]=\"%s\";\n", prefix, parent_key, key, value);
        }

      } else {

        # declare arrays here

        # reset this flag for each new array we work on...
        unsetArray = 0;

        # if item has no key, declare indexed array
        if (length(key) == 0) {
          # indexed
          printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

        # if item has numeric key, declare indexed array
        } else if (key ~ /^[[:digit:]]/) {
          printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

        # else (item has a string for a key), declare associative array
        } else {
          printf("unset %s%s; declare -g -A %s%s;\n", root_prefix, key, root_prefix, key);
        }

        # set root level values here

        if (indent > 0) {
          # add to associative array
          printf("%s%s[%s]+=\"%s%s\";\n", prefix, parent_key , key, root_prefix, key);
        } else {
          # add to indexed array
          printf("%s%s+=( \"%s%s\");\n", prefix, parent_key , root_prefix, key);
        }

      }
   }'
}

# helper to load yay data file
yay() {
  # yaml_to_vars "$@"  ## uncomment to debug (prints data to stdout)
  eval $(yaml_to_vars "$@")

  # yay_parse "$@"  ## uncomment to debug (prints data to stdout)
  eval $(yay_parse "$@")
}

解析器可以这样调用:

products.yml

它生成并评估以下代码:

  product1
    name: Foo
    price: 100
  product2
    name: Bar
    price: 200

因此,我得到了以下Bash数组和变量:

source path/to/yml-parser.sh
yay products.yml

在我的模板系统中,我现在可以像这样访问yml数据:

products_product1_name="Foo"
products_product1_price="100"
products_product2_name="Bar"
products_product2_price="200"
unset products;
declare -g -a products;
unset products_product1;
declare -g -A products_product1;
products+=( "products_product1");
products_product1[name]="Foo";
products_product1[price]="100";
unset products_product2;
declare -g -A products_product2;
products+=( "products_product2");
products_product2[name]="Bar";
products_product2[price]="200";

:)

另一个例子:

文件declare -a products=([0]="products_product1" [1]="products_product2") declare -A products_product1=([price]="100" [name]="Foo" ) declare -A products_product2=([price]="200" [name]="Bar" )

{{#foreach product in products}}
  Name:  {{product.name}}
  Price: {{product.price}}
{{/foreach}}

产生:

site.yml

在模板中,我可以像这样访问meta_info: title: My cool blog domain: foo.github.io author1: name: bob url: /author/bob author2: name: jane url: /author/jane header_links: link1: title: About url: about.html link2: title: Contact Us url: contactus.html js_deps: cashjs: cashjs jets: jets Foo: - one - two - three

declare -a site=([0]="site_meta_info" [1]="site_author1" [2]="site_author2" [3]="site_header_links" [4]="site_js_deps" [5]="site_Foo")
declare -A site_meta_info=([title]="My cool blog" [domain]="foo.github.io" )
declare -A site_author1=([url]="/author/bob" [name]="bob" )
declare -A site_author2=([url]="/author/jane" [name]="jane" )
declare -A site_header_links=([link1]="site_link1" [link2]="site_link2" )
declare -A site_link1=([url]="about.html" [title]="About" )
declare -A site_link2=([url]="contactus.html" [title]="Contact Us" )
declare -A site_js_deps=([cashjs]="cashjs" [jets]="jets" )
declare -a site_Foo=([0]="one" [1]="two" [2]="three")

site_header_links(破折号或简单列表),如下所示:

{{#foreach link in site_header_links}}
  * {{link.title}} - {{link.url}}
{{/foreach}}