我有一个包含以下行的文件:
"ALMEREWEG ";" 45 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 51 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
我有第二个包含以下行的文件:
3891ZP;50;
3891ZN;53;A
3891ZN;53;B
3891ZN;54;
现在我想根据第二个文件的模式grep第一个文件,其中:
A)第2个文件的第1列出现在第1个文件的第5列;和
B)第二个文件的第二列出现在第一个文件的第二列。
我的问题:怎么做?
2013年7月7日更新:我更新了file2格式以反映第三列(数量足够)。
答案 0 :(得分:3)
awk
的一种方式:
awk -F';' '
NR==FNR {
a[$1]=$2
next
}
{
line=$0
gsub(/\"/,"")
gsub(/ *; */,";")
if (a[$5]==$2) {
print line
line=""
}
}' file2 file1
<强>输出强>:
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
答案 1 :(得分:2)
从@JS那里借来的,我提供了以下改进的解决方案。他的代码存在的问题是,如果您在同一个邮政编码中有多个门牌号,那么它只会匹配最后一个。通过创建复合关联数组(如果这是名称...基本上将两个字段连接在一起),您可以解决这个问题:
创建文件postcode.awk
:
BEGIN {
FS=";"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$1,$2]=1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# copy the original line before modifying it
line=$0
# take out the double quotes
gsub(/\"/,"")
# take out the spaces on either side of the semicolons
gsub(/ *; */,";")
# see if the associative array element exists:
if (a[$5,$2]==1) {
# echo the original line that matched:
print line
}
}
使用测试文件file1
如下(我添加了一行来显示边框情况):
"ALMEREWEG ";" 45 ";" ";"ZEEWOLDE ";"3891ZN"
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
密钥文件file2
与(再次添加一行):
3891ZP;50
3891ZP;52
3891ZN;53
您将看到JS的代码与编号为50的行不匹配。
但我的代码确实:
awk -f postcode.awk file2 file1
产生
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 52 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
答案 2 :(得分:0)
您可以使用sed
之类的内容来构建grep
的模式:
$ grep -Ef <(sed -r 's/(.*);(.*)/^[^;]*;[^;]*\2[^;]*;([^;]*;){2}[^;]*\1/' file2) file1
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"
答案 3 :(得分:0)
我已使用bash的IFS
和read
将file2拆分为列。然后将列传递给grep:
# read line by line
while IFS=$'\n' read line ; do
# split into columns
IFS=$';' read -a col <<< "$line"
# the expression can be refined but should work well as is
grep -e ' '${col[1]}' ";".*;.*";"'${col[0]} file1
done < file2
输出:
"ALMEREWEG ";" 50 ";" ";"ZEEWOLDE ";"3891ZP"
"ALMEREWEG ";" 53 ";" ";"ZEEWOLDE ";"3891ZN"