Question

我想阅读 fileIn.txt （逗号分隔）并输出 fileOut.txt ，只有与给定的前3个不同值匹配的行柱。例如，我的输入文件如下所示：

fileIn.txt
#location,day,time
home,mon,01:00
office,mon,06:00
home,mon,10:00
office,tues,03:00
home,wed,08:00
home,wed,11:00
home,thurs,02:00
home,fri,01:00
diner,fri,07:00
party,fri,09:00
home,sat,02:00
mall,sat,06:00
home,sat,09:00
beach,sun,01:00

我想只选择前3个不同日期的行，以便我的输出文件如下所示：

fileOut.txt
#location,day,time
home,mon,01:00
office,mon,06:00
home,mon,10:00
office,tues,03:00
home,wed,08:00
home,wed,11:00

Answer 1

你的问题有点令人困惑。但是，如果我理解正确，您希望打印出任何一天中某一天与该脚本在文件中找到的前3个不同值之一匹配的行。你可以用awk这样做

BEGIN { FS="," }

{
    if(dayCount < 3 && !($2 in days)) { days[$2] = 1; ++dayCount }
    if ($2 in days) { print }
}

Answer 2

awk救援！包括更惯用形式的标题。

$ awk -F, 'NR==1{c[$2]} length(c)<4{c[$2]} $2 in c' file

#location,day,time
home,mon,01:00
office,mon,06:00
home,mon,10:00
office,tues,03:00
home,wed,08:00
home,wed,11:00

说明：第一个块用第一个行值初始化数组，因为在初始化之前无法检查数组的长度。数组c包含不同的$ 2字段，我们继续添加，直到第二个块中的大小达到4（也就是说，标题将有4个不同的值）。在最后一个块中，检查该行是否是其中一个不同的值并打印（作为默认操作）。

我不想让它变得更加神秘，但你可以合并前两个块，因为动作是相同的

$ awk -F, 'NR==1 || length(c)<4 {c[$2]} $2 in c' file

它取决于短路逻辑运算，直到它为NR==1初始化之后才评估长度。

Answer 3

awk -F, '
    /^#/            {print; next}   # keep comments
    ++seen[$2] == 1 {count++}       # incr counter the first time value is seen
    count > 3       {exit}          # quit if we have seen 4 values
                    {print}         # otherwise print this line
' file

awk：根据给定列的前3个不同值选择行

3 个答案: