我有一个包含大量数据的文本文件。下面显示了部分数据。我需要创建一个单独的欧洲子集文件。如何使用awk过滤掉它们?
文件列如下: 用户ID,纬度,经度,地点类别名称,国家代码(2个字母)
包含以下内容的文字文件
3fd66200f964a52008e61ee3 40.726589 -73.995649 Deli / Bodega US
4eaef4f4722e4efd614ddb80 51.515470 -0.148605 Burger Joint GB
4eaef8325c5c7964424125c8 50.739561 4.253660 Vineyard BE
4e210f60d22d0a3f59f4cbfb 5.367963 103.097516 Racetrack MY
52373a6511d2d4fcba683886 41.434926 2.220326 Medical Center ES
476f8da1f964a520044d1fe3 40.695163 -73.995448 Thai Restaurant US
新文本文件应如下所示:
4eaef4f4722e4efd614ddb80 51.515470 -0.148605 Burger Joint GB
4eaef8325c5c7964424125c8 50.739561 4.253660 Vineyard BE
52373a6511d2d4fcba683886 41.434926 2.220326 Medical Center ES
注意:我可以使用纬度经度边界框或国家/地区代码将子集提取到新文件中。
答案 0 :(得分:3)
首先,您需要在单独的文件中查看所需国家/地区的国家/地区代码(或所有纬度和经度以及相应的国家/地区代码:),以便检查:
$ cat countries.txt
GB
BE
ES
在awk中:
$ awk 'NR==FNR{a[$0];next} $NF in a' countries.txt file.txt
4eaef4f4722e4efd614ddb80 51.515470 -0.148605 Burger Joint GB
4eaef8325c5c7964424125c8 50.739561 4.253660 Vineyard BE
52373a6511d2d4fcba683886 41.434926 2.220326 Medical Center ES
说明:
NR==FNR { # this block {} is only processed for the first file (take it for granted)
a[$0] # this initializes an array element in a, for example a["GB"]
next # since we only initialize an element for each country code in the first file
# no need to process code beyond this point, just skip to NEXT country code
} # after this point we check whether country code exists in array a
$NF in a # if element in array a[] for value $NF in last field NF (for example a["GB"])
# of second file was initialized, it is required row and is printed.
# this could've been written: { if($NF in a) print $0 }
答案 1 :(得分:0)
使用grep:
grep -wFf countries.txt file.txt
选项说明:
-F
固定字符串搜索(无正则表达式)-f
指定模式文件-w
仅匹配整个单词