按姓氏排序Awk数组

时间:2015-03-21 19:55:17

标签: arrays sorting awk

在我的剧本中,我从一个竞选捐赠者的文件开始,任何捐赠500美元的人都有资格参加竞赛。符合该条件的任何人我都会添加一个带有递增索引的数组,以根据需要调整大小。每个索引的格式如下所示,X是电话号码。在脚本的END部分,我需要按姓氏($ 2)对此数组进行排序以进行打印。我已经做了一些搜索但空手而归。我没有要求有人为我输入脚本,只是为了指出一个更好的搜索方向或提供建议。我需要帮助对数组参赛者进行排序,因为目前它将按照我需要的方式使用字符串值进行正确填充。

v1,2,& 3是广告系列的贡献,我在命令中使用-F'[ :]'来获取空格和冒号作为字段分隔符。

输入文件lab4.data

Fname Lname:Phone__Number:v1:v2:v3   
Mike Harrington:(510) 548-1278:250:100:175 
Christian Dobbins:(408) 538-2358:155:90:201 
Susan Dalsass:(206) 654-6279:250:60:50 
Archie McNichol:(206) 548-1348:250:100:175 
Jody Savage:(206) 548-1278:15:188:150 
Guy Quigley:(916) 343-6410:250:100:175 
Dan Savage:(406) 298-7744:450:300:275 
Nancy McNeil:(206) 548-1278:250:80:75 
John Goldenrod:(916) 348-4278:250:100:175 
Chet Main:(510) 548-5258:50:95:135   
Tom Savage:(408) 926-3456:250:168:200  
Elizabeth Stachelin:(916) 440-1763:175:75:300 

要容纳任何人的数组> $ 500,$ 8创建并保持价值$ 5 + $ 6 + $ 7: 数组被初始化并填入下面给出的循环

$8 = $5+$6+$7;

contestants[len++]

循环检查将人员添加到参赛者阵列。 name和number是保存各自值的数组,供以后使用。

for(i=0;i<=NR;i++)if(contrib[i]>500){contestants[len++]= name[i]"   "number[i] }

索引的格式化(参赛者所需的数组值[len ++]):

[0]   Mike Harrington (510) 548-1278
[1]   Archie McNichol (206) 548-1348 
[2]   Guy Quigley (916) 343-6410
[3]   Dan Savage (406) 298-7744
[4]   John Goldenrod (916) 348-4278
[5]   Tom Savage (408) 926-3456
[6]   Elizabeth Stachelin (916) 440-1763

循环打印/检查数组是否已正确填充(确实如此)

for (i=0; i <len; i++) {print contestants[i]}

输出:

Mike Harrington (510) 548-1278
Archie McNichol (206) 548-1348
Guy Quigley (916) 343-6410
Dan Savage (406) 298-7744
John Goldenrod (916) 348-4278
Tom Savage (408) 926-3456
Elizabeth Stachelin (916) 440-1763

所需的最终输出:忽略格式化,因为它在我的终端中正确显示我很难在这里很好地完成它。

               ***FIRST QUARTERLY REPORT***                          
            ***CAMPAIGN 2004 CONTRIBUTIONS***                       

   Name            Phone             Jan  |  Feb  |  Mar  |  Total Donated 
Mike Harrington     (510)548-1278   $ 250   $ 100   $ 175   $ 525
Christian Dobbins   (408)538-2358   $ 155   $ 90    $ 201   $ 446
Susan Dalsass       (206)654-6279   $ 250   $ 60    $ 50    $ 360
Archie McNichol     (206)548-1348   $ 250   $ 100   $ 175   $ 525
Jody Savage         (206)548-1278   $ 15    $ 188   $ 150   $ 353
Guy Quigley         (916)343-6410   $ 250   $ 100   $ 175   $ 525
Dan Savage          (406)298-7744   $ 450   $ 300   $ 275   $ 1025
Nancy McNeil        (206)548-1278   $ 250   $ 80    $ 75    $ 405
John Goldenrod      (916)348-4278   $ 250   $ 100   $ 175   $ 525
Chet Main           (510)548-5258   $ 50    $ 95    $ 135   $ 280
Tom Savage          (408)926-3456   $ 250   $ 168   $ 200   $ 618
Elizabeth Stachelin (916)440-1763   $ 175   $ 75    $ 300   $ 550
-----------------------------------------------------------------------------
SUMMARY
-----------------------------------------------------------------------------
The campaign received a total of $6137.00 for this quarter.
The average donation for the 12 contributors was $511.42.
The highest total contribution was $1025.00 made by Dan Savage.
                ***Thank you Dan Savage***                           
The following people donated over $500 to the campaign.
They are eligible for the quarterly drawing!!
Listed are their names(sorted by last names) and phone numbers.

John Goldenrod (916) 348-4278
Mike Harrington (510) 548-1278
Archie McNichol (206) 548-1348
Guy Quigley (916) 343-6410
Dan Savage (406) 298-7744
Tom Savage (408) 926-3456
Elizabeth Stachelin (916) 440-1763

Thank you all for your continued support!!

2 个答案:

答案 0 :(得分:0)

使用gawk,可以直接使用内置排序功能,例如

BEGIN {
    data["Jane Doe (123) 456-7890"] = 600;
    data["Fred Adams (123) 456-7891"] = 800;
    data["John Smith (123) 456-7892"] = 900;
    exit;
    }

END {
    for (i in data) {
        split(i,x," ")
        data1[x[2] " " x[1] " " x[3] " " x[4]] = i;
        }
    asorti(data1,sdata1);
    for (i in sdata1) {
        print data1[sdata1[i]],"\t",data[data1[sdata1[i]]];
    }
}

......产生:

Fred Adams (123) 456-7891        800
Jane Doe (123) 456-7890          600
John Smith (123) 456-7892        900

在普通awk中,通过将数组索引写入文件,对该文件进行排序,然后使用getline读回文件,可以获得相同的结果。

答案 1 :(得分:0)

解决这个问题的方法是在读取数据时生成预摘要输出,这样您就不需要将所有数据存储在数组中,只需将贡献超过500美元的人员插入到数组中使用插入排序算法按所需顺序排列数组。

你会这样做:

awk -F':' '
NR==1 {
    print "header stuff"
    next
}
{
    tot = $3 + $4 + $5
    printf "%-20s%10s $%5s $%5s $%5s $%5s\n", $1, $2, $3, $4, $5, tot
}
tot > 500 {
    split($1,name,/ /)
    surname = name[2]
    numContribs++
    # insertion sort, check the algorithm:
    for (i=1; i<=numContribs; i++) {
        if (surname > surnames[i]) {
            for (j=numContribs; j>i; j--) {
                surnames[j+1] = surnames[j]
                contribs[j+1] = contribs[j]
            }
            surnames[i] = surname
            contribs[i] = $1 " " $2
            break
        }
    }
}
END {
    print "SUMMARY and text below it and then the list of $500+ contributors:"
    for (i=1; i<=numContribs; i++) {
        print contribs[i]
    }
}
' lab4.data

以上不是一个功能齐全的计划。它只是为了根据您的要求向您展示正确的方法。