Grep文件有两列作为输入

时间:2013-07-05 13:54:53

标签: bash awk grep matching

我有一个包含以下行的文件:

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

我有第二个包含以下行的文件:

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

现在我想根据第二个文件的模式grep第一个文件,其中:

A)第2个文件的第1列出现在第1个文件的第5列;和

B)第二个文件的第二列出现在第一个文件的第二列。

我的问题:怎么做?

2013年7月7日更新:我更新了file2格式以反映第三列(数量足够)。

4 个答案:

答案 0 :(得分:3)

awk的一种方式:

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

<强>输出

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

答案 1 :(得分:2)

从@JS那里借来的,我提供了以下改进的解决方案。他的代码存在的问题是,如果您在同一个邮政编码中有多个门牌号,那么它只会匹配最后一个。通过创建复合关联数组(如果这是名称...基本上将两个字段连接在一起),您可以解决这个问题:

创建文件postcode.awk

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

使用测试文件file1如下(我添加了一行来显示边框情况):

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

密钥文件file2与(再次添加一行):

3891ZP;50
3891ZP;52
3891ZN;53

您将看到JS的代码与编号为50的行不匹配。

但我的代码确实:

awk -f postcode.awk file2 file1

产生

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

答案 2 :(得分:0)

您可以使用sed之类的内容来构建grep的模式:

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

答案 3 :(得分:0)

我已使用bash的IFSread将file2拆分为列。然后将列传递给grep:

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

输出:

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"