使用awk从文本文件创建子集以匹配字符

时间:2016-10-07 08:03:09

标签: file unix awk

我有一个包含大量数据的文本文件。下面显示了部分数据。我需要创建一个单独的欧洲子集文件。如何使用awk过滤掉它们?

文件列如下: 用户ID,纬度,经度,地点类别名称,国家代码(2个字母)

包含以下内容的文字文件

3fd66200f964a52008e61ee3    40.726589   -73.995649  Deli / Bodega   US
4eaef4f4722e4efd614ddb80    51.515470   -0.148605   Burger Joint    GB
4eaef8325c5c7964424125c8    50.739561   4.253660    Vineyard    BE
4e210f60d22d0a3f59f4cbfb    5.367963    103.097516  Racetrack   MY
52373a6511d2d4fcba683886    41.434926   2.220326    Medical Center  ES
476f8da1f964a520044d1fe3    40.695163   -73.995448  Thai Restaurant US

新文本文件应如下所示:

4eaef4f4722e4efd614ddb80 51.515470 -0.148605 Burger Joint GB 4eaef8325c5c7964424125c8 50.739561 4.253660 Vineyard BE 52373a6511d2d4fcba683886 41.434926 2.220326 Medical Center ES

注意:我可以使用纬度经度边界框或国家/地区代码将子集提取到新文件中。

2 个答案:

答案 0 :(得分:3)

首先,您需要在单独的文件中查看所需国家/地区的国家/地区代码(或所有纬度和经度以及相应的国家/地区代码:),以便检查:

$ cat countries.txt
GB
BE
ES

在awk中:

$ awk 'NR==FNR{a[$0];next} $NF in a' countries.txt file.txt
4eaef4f4722e4efd614ddb80    51.515470   -0.148605   Burger Joint    GB
4eaef8325c5c7964424125c8    50.739561   4.253660    Vineyard    BE
52373a6511d2d4fcba683886    41.434926   2.220326    Medical Center  ES

说明:

NR==FNR {  # this block {} is only processed for the first file (take it for granted)
    a[$0]    # this initializes an array element in a, for example a["GB"]
    next     # since we only initialize an element for each country code in the first file
             # no need to process code beyond this point, just skip to NEXT country code
}          # after this point we check whether country code exists in array a
$NF in a     # if element in array a[] for value $NF in last field NF (for example a["GB"])
             # of second file was initialized, it is required row and is printed.
             # this could've been written: { if($NF in a) print $0 }

答案 1 :(得分:0)

使用grep:

grep -wFf countries.txt file.txt

选项说明:

  • -F固定字符串搜索(无正则表达式)
  • -f指定模式文件
  • -w仅匹配整个单词