如何使用shell脚本在行和列中转换我的文件

时间:2017-09-26 12:29:27

标签: bash shell awk

我有一个以下格式的文件。任何人都可以在列中转换它吗? 我已经尝试过下面的awk命令,但是如果一个客户有多个主机名,它会创建更多的4列。

awk '/"customer_name":/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' filename

输入:

customer_name: "abc"
  "HostName": "tm-1"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-2"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-3"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "new-va-threat-01"
  "LastDayRxBytes": 0
  "Status": "offline"
customer_name: "xyz"
  "HostName": "tm-56"
  "LastDayRxBytes": 10708747
  "Status": "ok"
customer_name: "def"
customer_name: "uvw"
  "HostName": "tm-23"
  "LastDayRxBytes": 34921829912
  "Status": "ok"
customer_name: "new cust"
  "HostName": "tm-1-3"
  "LastDayRxBytes": 33993187093
  "Status": "ok"
customer_name: "a12 d32 ffg"
customer_name: "bcd abc"
customer_name: "mno opq"
customer_name: "abc dhg pvt ltd."
  "HostName": "tm-10"
  "LastDayRxBytes": 145774401010
  "Status": "ok"
  "HostName": "tm-ngtm-13"
  "LastDayRxBytes": 150159680874
  "Status": "ok"
  "HostName": "new-ngtm-11"
  "LastDayRxBytes": 207392526747
  "Status": "ok"
  "HostName": "old-ngtm-06"
  "LastDayRxBytes": 17708734533
  "Status": "ok"
  "HostName": "tm-08"
  "LastDayRxBytes": 559289251
  "Status": "ok"
  "HostName": "tm-12"
  "LastDayRxBytes": 534145552271
  "Status": "ok"

我希望它以列和行打印为:

Column 1               Column 2             Column 3             Column 4
CustName               Host                 Last RX              Status
abc                    tm-1                 0                    offline
abc                    tm-2                 0                    offline
abc                    tm-3                 0                    offline
abc                    new-va-threat-01     0                    offline
xyz                    tm-56                10708747             ok
def                    
uvw                    tm-23                34921829912          ok
new_cust               tm-1-3               33993187093          ok
a12 d32 ffg
acd abc
mno opq
abc dhg pvt ltd.       tm-10                145774401010         ok
abc dhg pvt ltd.       tm-ngtm-13           150159680874         ok
abc dhg pvt ltd.       new-ngtm-11          207392526747         ok
abc dhg pvt ltd.       old-ngtm-06          17708734533          ok
abc dhg pvt ltd.       tm-08                559289251            ok
abc dhg pvt ltd.       tm-12                534145552271         ok

3 个答案:

答案 0 :(得分:1)

我写这个

awk -F": " -v OFS="\t" '
    BEGIN {print "CustName", "Host", "Last RX", "Status"}
    {
        gsub(/"/,"")
        sub(/^[[:blank:]]+/,"")
    }
    $1 == "customer_name" {
        if ("customer_name" in data && !have_data)
            print data["customer_name"]
        have_data = 0
    }
    {
        data[$1] = $2
    }
    ("HostName" in data) && ("LastDayRxBytes" in data) && ("Status" in data) {
        print data["customer_name"], data["HostName"], data["LastDayRxBytes"], data["Status"]
        delete data["HostName"]
        delete data["LastDayRxBytes"]
        delete data["Status"]
        have_data = 1
    }
' file | column -s $'\t' -t
CustName          Host              Last RX       Status
abc               tm-1              0             offline
abc               tm-2              0             offline
abc               tm-3              0             offline
abc               new-va-threat-01  0             offline
xyz               tm-56             10708747      ok
def
uvw               tm-23             34921829912   ok
new cust          tm-1-3            33993187093   ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.  tm-10             145774401010  ok
abc dhg pvt ltd.  tm-ngtm-13        150159680874  ok
abc dhg pvt ltd.  new-ngtm-11       207392526747  ok
abc dhg pvt ltd.  old-ngtm-06       17708734533   ok
abc dhg pvt ltd.  tm-08             559289251     ok
abc dhg pvt ltd.  tm-12             534145552271  ok

答案 1 :(得分:0)

Perl救援:

perl -lne '
    if (/customer_name: "(.*)"/) {
        print $h{name} unless $h{printed} || !%h;
        undef $h{printed} if $1 ne $h{name};
        $h{name} = $1;
    } else {
        /"([^"]+)": "?([^"]+)"?/ and $h{$1} = $2;
        $h{printed} = print join "\t",
            @h{qw{ name HostName LastDayRxBytes Status }}
            if "Status" eq $1;
    }
    END { print $h{name} unless $h{printed} || !%h }
    ' < input_file
  • %h哈希用于收集有关要打印的行的信息。
  • 读取客户名称时,如果尚未打印先前的客户名称,则会打印该客户名称。在输入的最后也会发生相同情况,以打印可能的最后一位客户而没有详细信息。
  • 读取状态时会打印一行。

答案 2 :(得分:0)

gnu awk 解决方案:

$ cat tst.awk
BEGIN {
   RS="customer_name: "
   pr("Column1", "Column2", "Column3", "Column4")
   pr("Custname", "Host", "Last RX", "Status")
}
match($0, /"([^"]+)"/, cust) {
   printed=0
   str=substr($0, RLENGTH+2)
   while (match( str, /"HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s/, col)){
      str=substr(str, RLENGTH+3)
      pr( cust[1], col[1], col[2], col[3] )
      printed=1
   }
   if (!printed) pr(cust[1])
}
function pr(cust,host,rx,status) {
   printf "%-16s\t%-16s\t%-16s\t%-10s\n", cust, host, rx, status
}

根据示例输入,可以使用正则表达式和匹配函数来处理这个。测试它:

$ awk -f tst.awk input.txt
Column1             Column2             Column3             Column4
Custname            Host                Last RX             Status
abc                 tm-1                0                   offline
abc                 tm-2                0                   offline
abc                 tm-3                0                   offline
abc                 new-va-threat-01    0                   offline
xyz                 tm-56               10708747            ok
def
uvw                 tm-23               34921829912         ok
new cust            tm-1-3              33993187093         ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.    tm-10               145774401010        ok
abc dhg pvt ltd.    tm-ngtm-13          150159680874        ok
abc dhg pvt ltd.    new-ngtm-11         207392526747        ok
abc dhg pvt ltd.    old-ngtm-06         17708734533         ok
abc dhg pvt ltd.    tm-08               559289251           ok
abc dhg pvt ltd.    tm-12               534145552271        ok

说明:

  • 记录分隔符RS在customer_name:上设置,因此$ 0包含每个客户的所有主机,rx和状态信息。
  • 与正则表达式"([^"]+)"的第一场比赛将抓住客户
  • 与正则表达式"HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s的第二场比赛将捕获主机名,rx和状态。
  • 如果第二场比赛成功,请缩短您想要在下一场比赛中使用的字符串。

我知道,这不是awk way的处理方式,但是输入的常规格式允许这个 - 非常简洁 - 基于正则表达式的解决方案。