Question

我有一个包含以下行的文件：

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 51  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

我有第二个包含以下行的文件：

3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;

现在我想根据第二个文件的模式grep第一个文件，其中：

A）第2个文件的第1列出现在第1个文件的第5列;和

B）第二个文件的第二列出现在第一个文件的第二列。

我的问题：怎么做？

2013年7月7日更新：我更新了file2格式以反映第三列（数量足够）。

Answer 1

awk的一种方式：

awk -F';' '
NR==FNR {
  a[$1]=$2
  next
}
{
  line=$0
  gsub(/\"/,"")
  gsub(/ *; */,";")
  if (a[$5]==$2) {
    print line
    line=""
  }
}' file2 file1

<强>输出：

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Answer 2

从@JS那里借来的，我提供了以下改进的解决方案。他的代码存在的问题是，如果您在同一个邮政编码中有多个门牌号，那么它只会匹配最后一个。通过创建复合关联数组（如果这是名称...基本上将两个字段连接在一起），您可以解决这个问题：

创建文件postcode.awk：

BEGIN {
  FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
  a[$1,$2]=1 # create one array element for each $1/$2 pair
  next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
  # copy the original line before modifying it
  line=$0
  # take out the double quotes
  gsub(/\"/,"")
  # take out the spaces on either side of the semicolons
  gsub(/ *; */,";")
  # see if the associative array element exists:
  if (a[$5,$2]==1) {
    # echo the original line that matched:
    print line
  }
}

使用测试文件file1如下（我添加了一行来显示边框情况）：

"ALMEREWEG               ";" 45  ";"      ";"ZEEWOLDE                ";"3891ZN"
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

密钥文件file2与（再次添加一行）：

3891ZP;50
3891ZP;52
3891ZN;53

您将看到JS的代码与编号为50的行不匹配。

但我的代码确实：

awk -f postcode.awk file2 file1

产生

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 52  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Answer 3

您可以使用sed之类的内容来构建grep的模式：

$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Answer 4

我已使用bash的IFS和read将file2拆分为列。然后将列传递给grep：

# read line by line
while IFS=$'\n' read line ; do
    # split into columns
    IFS=$';' read -a col <<< "$line"
    # the expression can be refined but should work well as is
    grep -e ' '${col[1]}'  ";".*;.*";"'${col[0]} file1
done < file2

输出：

"ALMEREWEG               ";" 50  ";"      ";"ZEEWOLDE                ";"3891ZP"
"ALMEREWEG               ";" 53  ";"      ";"ZEEWOLDE                ";"3891ZN"

Grep文件有两列作为输入

4 个答案: