awk删除重复的字段和计数

时间:2012-11-22 14:51:02

标签: unix awk

我遇到了一个问题,我试图只用awk解决。

我在结构中有一个csv文件:

Easting  Northing    Latitude    Longitude   Locality Name

Easting  "Northing"  "Latitude"  "Longitude"     "LocalityName"
364208  176288           51.48441   -2.51685     "Fishponds"
358596  172813           51.45278   -2.59726     "Bristol City Centre"
358886  177828           51.49789   -2.59367     "Southmead"
358839  177839           51.49798   -2.59435     "Southmead"
358980  177882           51.49838   -2.59232     "Southmead"
359009  177863           51.49821   -2.5919          "Southmead"
358839  177529           51.4952        -2.59431     "Southmead"
359475  168262           51.41192   -2.58409     "Hengrove Park"
358945  173526           51.45921   -2.59232     "Bristol"
358943  173525           51.4592    -2.59235     "Bristol"
358941  173524           51.45919   -2.59238     "Bristol"
358940  173523           51.45919   -2.59239     "Bristol"
358945  173528           51.45923   -2.59232     "Bristol"
358936  173520           51.45916   -2.59245     "Bristol"
358936  173521           51.45917   -2.59245     "Bristol"
358932  173516           51.45912   -2.5925          "Bristol"

等...我正在尝试编写一个awk脚本,它将计算每个Locality名称的实例和打印打印,这样输出就是:

Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove park 1
Bristol 8

到目前为止,我已经得到了这个:

BEGIN { i = 0; state = 0; names[NR]; FS=","; }

{
#for each element in names array, check if already exists.
    for(j=0;j<=i;j++)
    {
        if(names[j] == $5)
        {
        state = 1;
        break;
        }
    }
# If the name doesnt already exist add to names array
    if(state == 0)
    {
        names[i] = $5;
        i++;
    }
    state = 0;
}

END { 
    for(x=0;x<=i;x++)
    {
    print names[x];
    }
}

有希望对位置进行排序并删除重复项,但我仍然想不出一个好方法来计算每个位置的实例然后将它们列回来。

4 个答案:

答案 0 :(得分:5)

更简单的解决方案:

awk -F '"' 'NR>3 {locname[$2]++}
            END { for (n in locname) {print n, locname[n] } }' INPUTFILE

首先,输入文件分隔符设置为",因此第二个字段将是位置名称。跳过第一行(标题)。利用数组(键是第二个字段)来计算出现次数。在最后一行之后打印数组的键和值。

答案 1 :(得分:1)

这是使用GNU awk的一种方式。它解析文件两次,但给出了排序输出:

awk -F "\"" 'NR > 3 && FNR==NR { a[$2]++; next } $2 in a && !b[$2]++ { print $2, a[$2] }' file{,}

结果:

Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove Park 1
Bristol 8

答案 2 :(得分:0)

这可能对您有用:

awk -F\" '/^[0-9]/{if(!location){location=$2};if(location==$2){count++;next};print location,count;location=$2;count=1};END{print location,count}' file

仅当位置已排序(如您的示例中所示)时才会起作用,否则请使用:

awk -F\" '/^[0-9]/{count[$2]++;if(count[$2]==1)location[++order]=$2};END{for(n=1;n<=order;n++)print location[n],count[location[n]]}' file

答案 3 :(得分:0)

perl解决方案:

perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' your_file

下面测试:

> perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' temp
Bristol 8
Hengrove Park 1
Southmead 5
Bristol City Centre 1
Fishponds 1
>