Bash:如何从文本文件中提取表格式结构

时间:2017-03-21 15:11:44

标签: bash awk sed

我有一个日志文件,其中包含一些数据和类似于表格的重要部分,如下所示:

    //Some data

    --------------------------------------------------------------------------------
    -----                 Output Table                             -----
    --------------------------------------------------------------------------------
            NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    fooooooooo                               0        0          3        0        0
    boooooooooooooooooooooo                  0        0         30        0        0
    abv                                      0        0         16        0        0
    bhbhbhbh                                 0        0          3        0        0
    foooo                                    0        0        198        0        0

    WARNING: Some message...


    WARNING: Some message...

    aaaaaaaaa                                0        0         60        0        7
    bbbbbbbb                                 0        0         48        0        7
    ccccccc                                  0        0         45        0        7
    rrrrrrr                                  0        0         50        0        7
    abcabca                                  0        0         42        0        6

// Some data...

    --------------------------------------------------------------------------------
    -----                 Another Output Table                                 -----
    --------------------------------------------------------------------------------
         NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    $$foo12                                  0        0          3        0        0
    $$foo12_720_720_14_2                     0        0         30        0        0

我想从给定文件中提取所有类型的表并保存在单独的文件中。

备注:

  • 表的开头表示一行包含{NAME,Attr1,...,Attr5}字
  • 警告消息可能存在于表的范围内,应该被忽略
  • 当空行发生且该空行的下一行不是"警告"线。

所以我希望以下2个文件作为输出:

        NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
--------------------------------------------------------------------------------
fooooooooo                               0        0          3        0        0
boooooooooooooooooooooo                  0        0         30        0        0
abv                                      0        0         16        0        0
bhbhbhbh                                 0        0          3        0        0
foooo                                    0        0        198        0        0
aaaaaaaaa                                0        0         60        0        7
bbbbbbbb                                 0        0         48        0        7
ccccccc                                  0        0         45        0        7
rrrrrrr                                  0        0         50        0        7
abcabca                                  0        0         42        0        6
     NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
--------------------------------------------------------------------------------
$$foo12                                  0        0          3        0        0
$$foo12_720_720_14_2                     0        0         30        0        0

2 个答案:

答案 0 :(得分:0)

我会按照你的指示编写以下awk脚本。

#! /usr/bin/awk -f

# start a table with a NAME line
/^ +NAME/ {
    titles = $0
    print
    next
}

# don't print if not in table
! titles {
    next
}

# blank line may mean end-of-table
/^$/ {
    EOT = 1
    next
}

# warning is not EOT
/^WARNING/ {
    EOT = 0
    next
}

# end of table means we're not in a table anymore, Toto
EOT {
    titles = 0
    EOT = 0
    next
}

# print what's in the table
{ print }

答案 1 :(得分:0)

试试这个 -

awk -F'[[:space:]]+' 'NF>6 || ($0 ~ /-/ && $0 !~ "Output") {print $0}' f
    --------------------------------------------------------------------------------
    --------------------------------------------------------------------------------
            NAME                         Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    fooooooooo                               0        0          3        0        0
    boooooooooooooooooooooo                  0        0         30        0        0
    abv                                      0        0         16        0        0
    bhbhbhbh                                 0        0          3        0        0
    foooo                                    0        0        198        0        0
    aaaaaaaaa                                0        0         60        0        7
    bbbbbbbb                                 0        0         48        0        7
    ccccccc                                  0        0         45        0        7
    rrrrrrr                                  0        0         50        0        7
    abcabca                                  0        0         42        0        6
    --------------------------------------------------------------------------------
    --------------------------------------------------------------------------------
         NAME                            Attr1    Attr2      Attr3    Attr4    Attr5
    --------------------------------------------------------------------------------
    $$foo12                                  0        0          3        0        0
    $$foo12_720_720_14_2                     0        0         30        0        0