我遇到了一个问题,我试图只用awk解决。
我在结构中有一个csv文件:
Easting Northing Latitude Longitude Locality Name
Easting "Northing" "Latitude" "Longitude" "LocalityName"
364208 176288 51.48441 -2.51685 "Fishponds"
358596 172813 51.45278 -2.59726 "Bristol City Centre"
358886 177828 51.49789 -2.59367 "Southmead"
358839 177839 51.49798 -2.59435 "Southmead"
358980 177882 51.49838 -2.59232 "Southmead"
359009 177863 51.49821 -2.5919 "Southmead"
358839 177529 51.4952 -2.59431 "Southmead"
359475 168262 51.41192 -2.58409 "Hengrove Park"
358945 173526 51.45921 -2.59232 "Bristol"
358943 173525 51.4592 -2.59235 "Bristol"
358941 173524 51.45919 -2.59238 "Bristol"
358940 173523 51.45919 -2.59239 "Bristol"
358945 173528 51.45923 -2.59232 "Bristol"
358936 173520 51.45916 -2.59245 "Bristol"
358936 173521 51.45917 -2.59245 "Bristol"
358932 173516 51.45912 -2.5925 "Bristol"
等...我正在尝试编写一个awk脚本,它将计算每个Locality名称的实例和打印打印,这样输出就是:
Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove park 1
Bristol 8
到目前为止,我已经得到了这个:
BEGIN { i = 0; state = 0; names[NR]; FS=","; }
{
#for each element in names array, check if already exists.
for(j=0;j<=i;j++)
{
if(names[j] == $5)
{
state = 1;
break;
}
}
# If the name doesnt already exist add to names array
if(state == 0)
{
names[i] = $5;
i++;
}
state = 0;
}
END {
for(x=0;x<=i;x++)
{
print names[x];
}
}
有希望对位置进行排序并删除重复项,但我仍然想不出一个好方法来计算每个位置的实例然后将它们列回来。
答案 0 :(得分:5)
更简单的解决方案:
awk -F '"' 'NR>3 {locname[$2]++}
END { for (n in locname) {print n, locname[n] } }' INPUTFILE
首先,输入文件分隔符设置为"
,因此第二个字段将是位置名称。跳过第一行(标题)。利用数组(键是第二个字段)来计算出现次数。在最后一行之后打印数组的键和值。
答案 1 :(得分:1)
这是使用GNU awk
的一种方式。它解析文件两次,但给出了排序输出:
awk -F "\"" 'NR > 3 && FNR==NR { a[$2]++; next } $2 in a && !b[$2]++ { print $2, a[$2] }' file{,}
结果:
Fishponds 1
Bristol City Centre 1
Southmead 5
Hengrove Park 1
Bristol 8
答案 2 :(得分:0)
这可能对您有用:
awk -F\" '/^[0-9]/{if(!location){location=$2};if(location==$2){count++;next};print location,count;location=$2;count=1};END{print location,count}' file
仅当位置已排序(如您的示例中所示)时才会起作用,否则请使用:
awk -F\" '/^[0-9]/{count[$2]++;if(count[$2]==1)location[++order]=$2};END{for(n=1;n<=order;n++)print location[n],count[location[n]]}' file
答案 3 :(得分:0)
perl解决方案:
perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' your_file
下面测试:
> perl -F\" -lane 'if($.>3){$X{$F[1]}++}END{foreach (keys %X){print $_." ".$X{$_}}}' temp
Bristol 8
Hengrove Park 1
Southmead 5
Bristol City Centre 1
Fishponds 1
>