由于所选字符串分隔符出现在字符串值中,因此输入无法正确解析的以下脚本。
我不完全确定为什么输出是这样的,但基本上问题似乎是由于以下输入中的evalue =:
avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456);
包含分隔符。的
| awk ' { n=split($0,pcv,";") ;
我想知道是否有办法修改分隔符regexp,如果它出现在方括号()内,或者即使前后字符为5,也不会分隔; A
我知道我可以用sed修改输入字符串来做到这一点,但是认为最好在awk中进行。
pcvtmp='avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456); fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999;'
pcv=`echo $pcvtmp | awk ' { print $1 } ' \
| awk ' { n=split($0,pcv,";") ;
for(i=1;i<n;i++){
split(pcv[i],a1,"=");
#printf (" debug: \"%s\" | \"%s\",\n", a1[1], a1[2]);
if( a1[1]=="avalue") {printf (" a\"avalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="bvalue") {printf (" b\"bvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="cvalue") {printf (" c\"cvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="dvalue") {printf (" d\"dvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="evalue") {printf (" e\"evalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="fvalue") {printf (" f\"fvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="gvalue") {printf (" g\"gvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="hvalue") {printf (" h\"hvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="ivalue") {printf (" i\"ivalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="jvalue") {printf (" j\"jvalue\": \"%s\",\n" , a1[2] ); continue } ;
if( a1[1]=="kvalue") {printf (" k\"kvalue\": \"%s\",\n" , a1[2] ); continue } ;
}
} '`
echo "outof awk --"
echo "$pcv"
当前输出它:
# ./awk1.sh
outof awk --
a"avalue": "one",
b"bvalue": "2.2.2.2",
c"cvalue": "3",
d"dvalue": "4.4.4",
e"evalue": "(HELLO:5",
预期产出
# ./awk1.sh
outof awk --
a"avalue": "one",
b"bvalue": "2.2.2.2",
c"cvalue": "3",
d"dvalue": "4.4.4",
e"evalue": "(HELLO:5;ABC:value=123.456)"
f"evalue": "five"
g"gvalue": "six"
h"hvalue": "7"
i"ivalue": "eight.8"
j"jvalue": "(HELLO:5;ABC:value2=onetwothree)"
k"kvalue" "9999999"
答案 0 :(得分:3)
您可以使用此gnu awk命令在;
上进行拆分而忽略;
内的(...)
:
pcvtmp='avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456); fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999;'
awk -v FPAT='[[:alnum:]_]+=(\\([^)]*\\)|[^;]+)' '{
for (i=1; i<=NF; i++) {
sub(/=/, "\": \"", $i)
print substr($i, 1, 1) "\"" $i "\""
}
}' <<< "$pcvtmp"
a"avalue": "one"
b"bvalue": "2.2.2.2"
c"cvalue": "3"
d"dvalue": "4.4.4"
e"evalue": "(HELLO:5;ABC:value=123.456)"
f"fvalue": "five"
g"gvalue": "six"
h"hvalue": "7"
i"ivalue": "eight.8"
j"jvalue": "(HELLO:5;ABC:value2=onetwothree)"
k"kvalue": "9999999"
棘手的部分正在使用FPAT
这个正则表达式:
[[:alnum:]_]+=(\\([^)]*\\)|[^;]+)
形成每个key=value
对的字段。此处键是一个或多个单词字符,后跟=
,值为(...)
或除;
之外的其他任何内容。
答案 1 :(得分:1)
另一个awk
解决方案:
$ cat sp.awk
function key() {
match(line, /^[^=]+/)
tok = substr(line, 1, RLENGTH)
line = substr(line, RLENGTH + 1)
}
function eat(s) {
line = substr(line, length(s) + 1)
}
function val() {
if (match(line, /^\(.*\)/) || # try with brackets
match(line, /^[^;]+/)) { # try without brackets
tok = substr(line, 1, RLENGTH)
line = substr(line, RLENGTH + 1)
} else {
print "fail to read" | "cat 1>&2"; exit(1)
}
}
{
line = $0
while (length(line)) { # `line' and `tok' are global
key(); k = tok
eat("=")
val(); v = tok
eat(";")
print k, v
}
}
用法:
awk -f sp.awk file.txt
答案 2 :(得分:0)
最终通过搜索密钥/对并在此网站上乱搞来解决这个问题:
https://regex101.com/r/NwCI3b/1
这适用于正则表达式;
((?:\([^\)]*\)|[^=;])*)=((?:\([^\)]*\)|[^=;])*)
或者这个:
([^=,]*)=((?:\([^\)]*\)|[^=;])*)
测试字符串如上所示:
avalue=one;bvalue=2.2.2.2;cvalue=3;dvalue=4.4.4;evalue=(HELLO:5;ABC:value=123.456);fvalue=five;gvalue=six;hvalue=7;ivalue=eight.8;jvalue=(HELLO:5;ABC:value2=onetwothree);kvalue=9999999