Question

从这个文件中，使用awk，我想检索所有IP对：端口，除非是来自斯威士兰

        </tr>
            <tr>
                <td>17m 19s ago</td>
                <td><script>                            document.write('93.90.232.113')</script></td>
                <td><script>                            document.write('18297')</script></td>
                <td><a href="/sockslist/country/?c=swaziland ">swaziland </a></td>
                <td></td>
                <td class="center">SOCK4/5</td>
                <td class="center"><span class=blue>7</span>/<span class=red>0</span</td>
                <td class="center fast">68ms</td>
            </tr>
            <tr>
                <td>20m 44s ago</td>
                <td><script>                            document.write('209.61.226.80')</script></td>
                <td><script>                            document.write('443')</script></td>
                <td><a href="/sockslist/country/?c=Wonderfullland">Wonderfullland</a></td>
                <td></td>
                <td class="center">SOCK4</td>
                <td class="center"><span class=blue>205</span>/<span class=red>0</span</td>
                <td class="center fast">127ms</td>

所以这里的输出应该是：

209.61.226.80:443

我可以使用以下内容获取ips：

    #! /usr/bin/awk -f

    match ($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
      { 
          ip = substr($0,RSTART,RLENGTH)
          print ip;
      }

使用这段代码，我得到的东西（为什么两次......？）：

... [...]

93.90.232.113

[......每个输出之间有很多空白......]

[...]（＆＃39; 209.61.226.80＆＃39;）[...]

209.61.226.80

使用grep，它工作正常，但我不知道如何获取相应的端口（并且仍然存在国家过滤器问题）：

grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" <file>

关于国家/地区过滤器，使用＆＃34; / country /＆＃34;不好，因为这个词出现在其他地方，它应该是这样的：/ try /？c = /但它对我不起作用。

有什么想法吗？

非常感谢大家！强文

Answer 1

尝试以下组合的awk，grep，paste命令，

$ awk 'BEGIN{ RS="<tr>"} /swaziland/{next}1' file | grep -oP "(?<=document\.write\(\')[^']*" | paste -d: - -
209.61.226.80:443

通过将RS（记录分离器）设置为<tr>，awk根据
标签的出现将整个文件拆分为记录。从记录中，如果记录包含swaziland，则会跳过记录，并打印另一条记录。

grep -oP "(?<=document\.write\(\')[^']*"

正向前瞻用于匹配document.write('之后到下一个'字符的字符串。

现在它以两个单独的行打印输出。

paste -d: - -有助于将结果合并到由:

分隔的单行中

Answer 2

以下是我使用gnu awk

的方法

awk 'NR>1 && !/swaziland/ {print $2":"$4}' FS="'" RS="<tr>" file
209.61.226.80:443

设置RS="<tr>" awk会将数据分配到部分
然后NR>1 && !/swaziland/告诉awk忽略第一部分，忽略部分swaziland 通过设置FS="'"，您可以在字段2和4

中轻松获取数据

Answer 3

这可能对您有用：

awk -F\' '/[0-9]+(\.[0-9]+){3}/{ip=$2; getline; port=$2; getline; if (!/swaziland/) print ip":"port}' file

从文本中获取IP和端口

3 个答案: