拆分的AWK数组未排序

时间:2015-08-31 07:02:12

标签: arrays awk split gawk

我在变量ArtTEXT中有这个(演示)文本。

{1}: Reporting Problems and Bugs. 
{2}: Other freely available awk implementations. 
{3}: Summary of installation. 
{4}: How to disable certain gawk extensions. 
{5}: Making Additions To gawk. 
{6}: Accessing the Git repository. 
{7}: Adding code to the main body of gawk. 
{8}: Porting gawk to a new operating system.  
{9}: Why derived files are kept in the Git repository. 

这是一个变量,其中行以缩进分隔。

indent = "\n\t\t\t";

我想循环遍历各行并检查每行中的内容。

因此我使用缩进

将其拆分为数组
split(ArtTEXT,lin, indent);

然后我遍历数组lin

l = 0;
for (l in lin) {
    print "l -- ", l, " lin[l] -- " ,lin[l] ;
}

我得到的是第4行开始的ArtTEXT线条

l --  4  lin[l] --  {3}: Summary of installation. 
l --  5  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  6  lin[l] --  {5}: Making Additions To gawk. 
l --  7  lin[l] --  {6}: Accessing the Git repository. 
l --  8  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  9  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  10  lin[l] --  {9}: Why derived files are kept in the Git repository. 
l --  1  lin[l] --   
l --  2  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  3  lin[l] --  {2}: Other freely available awk implementations. 

(原始文本的开头有一个空行。)

手册说明了拆分功能:

  

第一部分存储在数组[1]中,第二部分存储在数组[2]中,   等等。

如何避免此问题?

为什么会这样?

感谢。

1 个答案:

答案 0 :(得分:1)

在awk中,数组是无序的。如果他们碰巧按顺序出来,那是偶然的。

在GNU awk中,可以控制顺序。例如,要通过索引获得数字排序,请使用PROCINFO["sorted_in"]="@ind_num_asc"

$ awk -v ArtTEXT="$(cat file)" 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"; indent="\n\t\t\t"; split(ArtTEXT, lin, indent); for (l in lin) print "l -- ", l, " lin[l] -- " ,lin[l] ;}'
l --  1  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  2  lin[l] --  {2}: Other freely available awk implementations. 
l --  3  lin[l] --  {3}: Summary of installation. 
l --  4  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  5  lin[l] --  {5}: Making Additions To gawk. 
l --  6  lin[l] --  {6}: Accessing the Git repository. 
l --  7  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  8  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  9  lin[l] --  {9}: Why derived files are kept in the Git repository. 

或者,由于数组索引是数字的,我们可以使用for (l=1;l<=length(lin);l++) print...以数字方式循环:

$ awk -v ArtTEXT="$(cat file)" 'BEGIN{indent="\n\t\t\t"; split(ArtTEXT, lin, indent); for (l=1;l<=length(lin);l++) print "l -- ", l, " lin[l] -- " ,lin[l] ;}'
l --  1  lin[l] --  {1}: Reporting Problems and Bugs. 
l --  2  lin[l] --  {2}: Other freely available awk implementations. 
l --  3  lin[l] --  {3}: Summary of installation. 
l --  4  lin[l] --  {4}: How to disable certain gawk extensions. 
l --  5  lin[l] --  {5}: Making Additions To gawk. 
l --  6  lin[l] --  {6}: Accessing the Git repository. 
l --  7  lin[l] --  {7}: Adding code to the main body of gawk. 
l --  8  lin[l] --  {8}: Porting gawk to a new operating system.  
l --  9  lin[l] --  {9}: Why derived files are kept in the Git repository. 

多行版本

多行显示的GNU代码如下所示:

awk -v ArtTEXT="$(cat file)" '
BEGIN{
    PROCINFO["sorted_in"]="@ind_num_asc"
    indent="\n\t\t\t"
    split(ArtTEXT, lin, indent)
    for (l in lin)
        print "l -- ", l, " lin[l] -- " ,lin[l]
}'

而且,替代代码是:

awk -v ArtTEXT="$(cat file)" '
BEGIN{
    indent="\n\t\t\t"
    split(ArtTEXT, lin, indent)
    for (l=1;l<=length(lin);l++)
        print "l -- ", l, " lin[l] -- " ,lin[l]
}'