bash:awk使用分隔符拆分字符串,但如果它出现在括号中的字符串

时间:2016-11-17 11:25:55

标签: bash awk split

由于所选字符串分隔符出现在字符串值中,因此输入无法正确解析的以下脚本。

我不完全确定为什么输出是这样的,但基本上问题似乎是由于以下输入中的evalue =:

avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456); 

包含分隔符。的

| awk ' { n=split($0,pcv,";") ;

我想知道是否有办法修改分隔符regexp,如果它出现在方括号()内,或者即使前后字符为5,也不会分隔; A

我知道我可以用sed修改输入字符串来做到这一点,但是认为最好在awk中进行。

pcvtmp='avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456); fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999;'


       pcv=`echo $pcvtmp | awk ' { print $1 } ' \
                         | awk ' { n=split($0,pcv,";") ;
                                   for(i=1;i<n;i++){
                                   split(pcv[i],a1,"=");
                                        #printf (" debug: \"%s\" | \"%s\",\n", a1[1], a1[2]);
            if( a1[1]=="avalue")        {printf ("   a\"avalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="bvalue")        {printf ("   b\"bvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="cvalue")        {printf ("   c\"cvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="dvalue")        {printf ("   d\"dvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="evalue")        {printf ("   e\"evalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="fvalue")        {printf ("   f\"fvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="gvalue")        {printf ("   g\"gvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="hvalue")        {printf ("   h\"hvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="ivalue")        {printf ("   i\"ivalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="jvalue")        {printf ("   j\"jvalue\": \"%s\",\n"      , a1[2] ); continue } ;
            if( a1[1]=="kvalue")        {printf ("   k\"kvalue\": \"%s\",\n"      , a1[2] ); continue } ;
                                   }
                                 } '`

                    echo "outof awk --"
                    echo "$pcv"

当前输出它:

# ./awk1.sh
outof awk --
   a"avalue": "one",
   b"bvalue": "2.2.2.2",
   c"cvalue": "3",
   d"dvalue": "4.4.4",
   e"evalue": "(HELLO:5",

预期产出

# ./awk1.sh
outof awk --
   a"avalue": "one",
   b"bvalue": "2.2.2.2",
   c"cvalue": "3",
   d"dvalue": "4.4.4",
   e"evalue": "(HELLO:5;ABC:value=123.456)"    
   f"evalue": "five"
   g"gvalue": "six"
   h"hvalue": "7"
   i"ivalue": "eight.8"
   j"jvalue": "(HELLO:5;ABC:value2=onetwothree)"
   k"kvalue" "9999999"

3 个答案:

答案 0 :(得分:3)

您可以使用此gnu awk命令在;上进行拆分而忽略;内的(...)

pcvtmp='avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456); fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999;'

awk -v FPAT='[[:alnum:]_]+=(\\([^)]*\\)|[^;]+)' '{
   for (i=1; i<=NF; i++) {
      sub(/=/, "\": \"", $i)
      print substr($i, 1, 1) "\"" $i "\""
   }
}' <<< "$pcvtmp"

a"avalue": "one"
b"bvalue": "2.2.2.2"
c"cvalue": "3"
d"dvalue": "4.4.4"
e"evalue": "(HELLO:5;ABC:value=123.456)"
f"fvalue": "five"
g"gvalue": "six"
h"hvalue": "7"
i"ivalue": "eight.8"
j"jvalue": "(HELLO:5;ABC:value2=onetwothree)"
k"kvalue": "9999999"

棘手的部分正在使用FPAT这个正则表达式:

[[:alnum:]_]+=(\\([^)]*\\)|[^;]+)

形成每个key=value对的字段。此处键是一个或多个单词字符,后跟=,值为(...)或除;之外的其他任何内容。

答案 1 :(得分:1)

另一个awk解决方案:

$ cat sp.awk

function key() {
    match(line, /^[^=]+/)
    tok  = substr(line, 1, RLENGTH) 
    line = substr(line, RLENGTH + 1)
}

function eat(s) {
    line = substr(line, length(s) + 1)
}

function val() {
    if (match(line, /^\(.*\)/) || # try with brackets
        match(line, /^[^;]+/)) {  # try without brackets
       tok  = substr(line, 1, RLENGTH)
       line = substr(line, RLENGTH + 1)
    } else {
       print "fail to read" | "cat 1>&2"; exit(1)
    }
}

{
    line = $0
    while (length(line)) { # `line' and `tok' are global
        key(); k = tok
        eat("=")
        val(); v = tok
        eat(";")
        print k, v
    }
}

用法:

awk -f sp.awk file.txt

答案 2 :(得分:0)

最终通过搜索密钥/对并在此网站上乱搞来解决这个问题:

https://regex101.com/r/NwCI3b/1

这适用于正则表达式;

((?:\([^\)]*\)|[^=;])*)=((?:\([^\)]*\)|[^=;])*)

或者这个:

([^=,]*)=((?:\([^\)]*\)|[^=;])*)

测试字符串如上所示:

avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456);fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999