根据键从表中获取所需数据

时间:2016-03-06 07:14:30

标签: python linux bash shell

我在一个文件中有一个数据集,由三列组成(IP地址,端口,域名),如下所示:

172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.16 52234 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.59 57106 ssl.gstatic.com
172.56.146.18 58897 ssl.gstatic.com
172.56.146.16 52258 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 55694 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.32 64281 ssl.gstatic.com
172.56.146.39 60581 ssl.gstatic.com
172.56.146.13 57137 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 64763 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 57135 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 51318 r4---sn-uhvcpax0n5-x5ue.googlevideo.com

我在文件中也有一个密钥集,仅包含IP地址和端口:

172.56.146.15 49333
172.56.146.16 52233
172.56.146.46 56483
172.56.146.14 58928
172.56.146.16 61981
172.56.146.13 64576
172.56.146.14 58157
172.56.146.18 62666
172.56.146.15 55682
172.56.146.14 57054

现在我想逐个考虑密钥集中的所有行,将其作为我的数据集的输入,作为回报,我应该能够从每个密钥的数据集中获取域名(IP地址和端口)取自钥匙套装。)

例如,对于172.56.146.15 49333我可以得到结果"找不到域名#34;对于172.56.146.46 56483,我应该得到结果ssl.gstatic.com,依此类推。任何人都可以告诉我如何使用shell命令或脚本执行此操作,以便生成的输出如下(与键集中的键一一对应):

domain not found
ssl.gstatic.com
r5---sn-uhvcpax0n5-x5ue.googlevideo.com

3 个答案:

答案 0 :(得分:2)

两个解决方案,都将数据文件读入数组,然后查找密钥文件中每一行的数组值。

  1. “Pure”Bash(仅限内置插件):

    #!/bin/bash
    
    # Declare associative array
    declare -A datafile
    
    # Read data file into associative array
    while read -r ip_addr port domain; do
        datafile["$ip_addr $port"]="$domain"
    done < "$1"
    
    # Look up value for each key from key file in array
    while IFS= read -r key; do
        # Use parameter expansion to print "not found" if key is not in array
        printf "%s\n" "${datafile[$key]:-domain not found}"
    done < "$2"
    

    这称为如下:

    ./SO.sh data keys
    

    其中SO.sh是脚本文件的名称,data是数据文件,keys是带有密钥的文件。

  2. awk中:

    #!/usr/bin/awk -f
    
    # Process first file, read into array
    NR == FNR {
        datafile[$1, $2] = $3
        next
    }
    
    # Look up value for key
    {
        if (datafile[$1, $2] == "")
            print "domain not found"
        else
            print datafile[$1, $2]
    }
    

    假设它存储在SO.awk中,则调用此方法,如下所示:

    ./SO.awk data keys
    
  3. 对于大型文件,awk解决方案的速度将提高几个数量级。

答案 1 :(得分:1)

使用GNU bash:

#!/bin/bash

while read -r ip foo bar; do
  grep "$ip $foo" dataset
  [[ $? != 0 ]] && echo "$ip $foo domain not found"
done < keys

输出:

172.56.146.15 49333 domain not found
172.56.146.16 52233 domain not found
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 58928 domain not found
172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com

答案 2 :(得分:1)

使用此

#!/bin/sh

while IFS='' read -r line || [[ -n "$line" ]]; do
    if grep -q -s "$line" table.txt; then
        result=($(grep -s $line table.txt))
        echo ${result[2]}
    else
        echo "domain not found"
    fi
done < "$1"

跑步:

./myscript.sh key.txt

结果:

domain not found
domain not found
ssl.gstatic.com
domain not found
r5---sn-uhvcpax0n5-x5ue.googlevideo.com
r2---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com
ssl.gstatic.com
r4---sn-uhvcpax0n5-x5ue.googlevideo.com
r3---sn-uhvcpax0n5-x5ue.googlevideo.com