Question

我有一个域列表：

test.example.com
example.com
example.test.com
test.test.com
test.com
test.example.example.org
example.example.org

如果存在父域，我需要删除所有子域。输出必须如下：

example.com
test.com
example.example.org

这可能吗？

Answer 1

以下内容取决于rev，它不是Posix标准，但随时可用。

rev file.txt |
sort |
awk 'NR!=1&&substr($0,0,length(p))==p{next}{p=$0".";print}' |
rev

来自man rev：

rev命令是util-linux软件包的一部分，可从ftp://ftp.kernel.org/pub/linux/utils/util-linux/
获得

您可以使用awk实现反向（它不必逐个字符;这一个逐段进行）：

awk -F. '{for (i=NF; i>1; --i) printf "%s.",$i;print $1}'

使用它，上面的管道变得有点长：

awk -F. '{for (i=NF; i>1; --i) printf "%s.",$i;print $1}' file.txt |
sort |
awk -F. 'NR!=1&&substr($0,0,length(p))==p{next}
         {p=$0".";for (i=NF; i>1; --i) printf "%s.",$i;print $1}'

Answer 2

sed -n 's/.*/²&³/;H
${g
:a
   s/\²\([^³]*³\)\(.*\).²[^³]*\1/²\1\2/
   ta
:b
   s/.²[^³]*\.\([^³]*³\)\(.*\)²\1/\2²\1/;tb
   s/[²³]//g;s/^\n//
   p
   }' YourFile

使用分隔符加载缓冲区中的所有文件，而不是更改在另一行上找到的作为结束子字符串的任何字符串。最后删除分隔符并打印

Answer 3

带内存的解决方案：首先将数据加载到哈希中，然后如果在转换时它们在哈希值中，则跳过行。

运行脚本时，请注意输入文件作为参数传递两次

USAGE: remove_subdomains.awk myfile1 myfile1

这里是脚本remove_subdomain.awk

# remove_subdomain.awk
FNR == NR {
    memory[toupper($0)] = 42
    next
}

match($0, /^[^.]+\.(.+)$/, mdata) {
    if (toupper(mdata[1]) in memory)
        $0 = ""
}

$0

Sed，awk，grep或其他东西。如果父域存在，则从列表中删除子域

3 个答案: