Question

我有一个包含五列的文件，第二列包含分隔文本。我想拆分分隔文本重复删除它并打印成行。我可以使用下面的命令来完成它。我想制作一个awk脚本。任何人都可以帮助我。

awk -F"\t" 'NR>1{print $2}' <input file> | awk -F\| '{for (i = 0; ++i <= NF;) print $i}' | awk '!x[$0]++'

输入文件：

test    hello|good|this|will|be    23421    test    4543
test2    good|would|may|can    43234    test2    3421

输出：

hello
good
this
will
be
would
may
can

Answer 1

你可以使用这个单一的awk单行：

$ awk '{split($2,a,"|");for(i in a)if(!seen[a[i]]++)print a[i]}' file
will
be
hello
good
this
can
would
may

第二个字段被分割为a字符上的数组|。如果a中的每个元素都不在seen中，则会打印每个元素，这在第一次出现时才会生效。

请注意，键的顺序未定义。

要保留订单，您可以使用：

$ awk '{n=split($2,a,"|");for(i=1;i<=n;++i)if(!seen[a[i]]++)print a[i]}' file

split返回数组a中的元素数，您可以使用它们按照它们出现的顺序循环遍历它们。

Answer 2

在我看到之前，我写了Tom的答案。如果你想保持单词的顺序，那就更多了：

awk '
    {
        n = split($2, a, "|")
        for (i=1; i<=n; i++) 
            if (!(a[i] in seen)) {
                # the hash to store the unique keys
                seen[a[i]] = 1
                # the array to store the keys in order
                words[++count] = a[i]
            }
    }
    END {for (i=1; i<=count; i++) print words[i]}
' file

hello
good
this
will
be
would
may
can

Answer 3

我将如何做到这一点：

awk '{n=split($2,a,"|");for (i=1;i<=n;i++) print a[i]}' file
hello
good
this
will
be
good
would
may
can

或者这样（这可能会改变outdata的顺序，但由于某些原因我不确定，它在这里工作正常）：

awk '{split($2,a,"|");for(i in a) print a[i]}' file
hello
good
this
will
be
good
would
may
can

或者如果您不喜欢重复输出：

awk '{split($2,a,"|");for(i in a) if (!f[a[i]]++) print a[i]}' file
hello
good
this
will
be
would
may
can

awk：将一行分隔文本分成一行

3 个答案: