UNIX AWK脚本 - 内存耗尽

时间:2017-09-25 11:29:30

标签: bash shell unix awk

我有一个输入CSV文件,如下所示:

123456,ABC,A,,,
123457,DEF,A,H,,
1234568,GHI,,H,,
111111,AAA,A,,,
12345699,XYZ,A,H,,

现在,我有一个AWK脚本,其中包含具有多个IF条件的以下行:

BEGIN { FS=","}
{ 
variable=$1.","$2;
if(variable ~ /^123456.+,ABC/) print "P," $0; else
if(variable ~ /^123457.+,DEF/) print "P," $0; else
if(variable ~ /^123458.+,GHI/) print "R," $0; else
if(variable ~ /^1234599.+,XYZ/) print "P," $0; else print "U" ","  $0;} 
END { }

在输入文件上运行此AWK脚本后,我得到以下输出:

P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
P,12345699,XYZ,A,H,,

到目前为止,一切都运行正常,但是当我不得不在这个AWK脚本(3500左右)中添加更多IF条件时,它会使内存耗尽并且#39;错误:

awk: script.awk:1259: if(variable ~ /^123311.+,AB23/) print "P," $0; else
awk: script.awk:1259:                                              ^ memory exhausted

现在有趣的部分:首先,内存耗尽错误总是在第1259行,第二,当我删除第1259行(包括1259)之后的IF条件数时,脚本再次顺利运行。 AWK / GAWK脚本中的IF条件数量有限制吗?

我使用的AWK版本是:

GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.3, GNU MP 6.1.0)

3 个答案:

答案 0 :(得分:2)

不知道GNU awk中是否存在if限制,但是不要在代码中放置这么多if,而是用内容解决它,有点像这样(它只是快速草案):

$ cat rules   # put your logic here
P,123456,ABC
P,123457,DEF
R,1234568,GHI

代码:

$ awk '
BEGIN { FS=OFS="," }                       
NR==FNR {                                  # read in the rules file
    a[$2","$3]=$1                          # and hash it
    next
}
{                                          # read the input file
    print ($1","$2 in a?a[$1","$2]:"U"),$0 # read code from a hash and it or U if not found
}' rules input                             # mind the order
P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
U,12345699,XYZ,A,H,,

修改

如果您使用GNU awk,只将$1$2的开头存储到2D数组中,您可以实现类似的效果:

$ cat rules   # put your logic here, notice 1st and 3rd
P,123456,ABC
P,123457,DEF
R,123456,GHI

代码:

$ awk '
BEGIN { FS=OFS="," }
NR==FNR {
    a[$2][$3]=$1
    next
}
{
    p=substr($1,1,6)
    print (p in a && $2 in a[p] ? a[p][$2] : "U"),$0
}' rules input
P,123456,ABC,A,,,    # matches 1st record in rules file
P,123457,DEF,A,H,,   # 2nd
R,1234568,GHI,,H,,   # 3 rd
U,111111,AAA,A,,,    # no match
U,12345699,XYZ,A,H,, # 123456 would match but XYZ wont

答案 1 :(得分:1)

我怀疑您的代码中有多少独立的S3,但可能if的限制是否有限制因为那基本上只是一个很长的陈述。

试试看你是否还有问题:

if-else

我还清理了一些对你的问题没什么影响的其他事情。

如果由于之后需要在您的脚本中执行其他操作而无法执行上述操作,那么:

BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2 }
variable ~ /^123456.+,ABC/  { print "P", $0; next }
variable ~ /^123457.+,DEF/  { print "P", $0; next }
variable ~ /^123458.+,GHI/  { print "R", $0; next }
variable ~ /^1234599.+,XYZ/ { print "P", $0; next }
{ print "U",  $0 } 

将成为乘坐BEGIN { FS=OFS=","} { variable = $1 "." FS $2; f=0 } !f && variable ~ /^123456.+,ABC/ { print "P", $0; f=1 } !f && variable ~ /^123457.+,DEF/ { print "P", $0; f=1 } !f && variable ~ /^123458.+,GHI/ { print "R", $0; f=1 } !f && variable ~ /^1234599.+,XYZ/ { print "P", $0; f=1 } !f { print "U", $0 } 的另一种方式。

请注意,我并未暗示任何这种方法对于您尝试做的任何事情都是一种合理的方法,但我不太了解您真正想要做的事情。建议采用另一种方法,以便上述内容专注于帮助您在语法上解决您所获得的错误信息。

答案 2 :(得分:-1)

试试这个:

awk -F',' '{if($1$2 ~ /^123456+ABC/ || $1$2 ~ /^123457+DEF/ || $1$2 ~ /^12345699+XYZ/ || $1$2 ~ /^123311+AB23/){print "P," $0;} else if($1$2 ~ /^1234568+GHI/){print "R," $0;} else{ print "U" ","  $0}}' file