Question

我正在尝试编写将实现的代码：如果$ 7小于$ i（0 - 1，增量为0.05），则打印该行并传递给字数。我尝试这样做的方式是：

for i in $(seq 0 0.05 1); do awk '{if ($7 <= $i) print $0}' file.txt | wc -l ; done

这最终会为$ i的每个实例返回完整文件的字数（~4,000万行）。例如，当使用$ 7 <= 0.00时，它应该返回~67K。

我觉得在awk中可能有办法做到这一点，但我没有看到任何允许非整数的建议。

提前致谢。

Answer 1

使用-v将$ i作为变量传递给awk，所以：

OtherField

Answer 2

有些人编造了数据：

$ cat file.txt
1 2 3 4 5 6 7    a b c d e f
1 2 3 4 5 6 0.6  a b c
1 2 3 4 5 6 0.57 a b c d e f g h i j
1 2 3 4 5 6 1    a b c d e f g
1 2 3 4 5 6 0.21 a b
1 2 3 4 5 6 0.02 x y z
1 2 3 4 5 6 0.00 x y z l j k

一种可能的100％awk解决方案：

awk '
BEGIN { line_count=0 }

{ printf "================= %s\n",$0

  for (i=0; i<=20; i++)
     { if ($7 <= i/20)
         { printf "matching seq : %1.2f\n",i/20
           line_count++
           seq_count[i]++
           next
          }
     }
}

END { printf "=================\n\n"

      for (i=0; i<=20; i++)
         { if (seq_count[i] > 0)
             { printf "seq = %1.2f : %8s (count)\n",i/20,seq_count[i] }
         }

      printf "\nseq =  all : %8s (count)\n",line_count
    }
' file.txt


# the output:
================= 1 2 3 4 5 6 7    a b c d e f
================= 1 2 3 4 5 6 0.6  a b c
matching seq : 0.60
================= 1 2 3 4 5 6 0.57 a b c d e f g h i j
matching seq : 0.60
================= 1 2 3 4 5 6 1    a b c d e f g
matching seq : 1.00
================= 1 2 3 4 5 6 0.21 a b
matching seq : 0.25
================= 1 2 3 4 5 6 0.02 x y z
matching seq : 0.05
================= 1 2 3 4 5 6 0.00 x y z l j k
matching seq : 0.00
=================

seq = 0.00 :        1 (count)
seq = 0.05 :        1 (count)
seq = 0.25 :        1 (count)
seq = 0.60 :        2 (count)
seq = 1.00 :        1 (count)

seq =  all :        6 (count)

BEGIN { line_count=0 }：初始化总行计数器
print语句仅用于调试目的;将在处理完
for (i=0; i<=20; i++)：根据实现情况，某些版本的awk可能会出现舍入/准确性问题，序列中的非整数（例如，递增0.05），因此我们将使用整体我们序列的整数除以20（对于这个特殊情况），在后续测试中为我们提供0.05增量
$7 <= i/20：如果字段＃7小于或等于（i / 20）...
printf "matching seq ...：打印我们刚刚匹配的序列值（i / 20）
line_count++：添加＆＃39; 1＆＃39;到我们的总行计数器
seq_count[i]++：添加＆＃39; 1＆＃39;到我们的序列计数器数组
next：突破我们的序列循环（因为我们找到了匹配的序列值（i / 20），并处理文件中的下一行
END ...：打印出我们的行数
for (x=1; ...) / if / printf：遍历我们的序列数组，打印每个序列的行数（i / 20）
printf "\nseq = all...：打印出总行数

注意：某些awk代码可以进一步缩小，但我会将其保留原样，因为如果您是{{1}的新手，它会更容易理解}。

100％awk解决方案的一个（显而易见的？）好处是我们的序列/循环结构是{{1}}的内部结构，因此允许我们通过输入文件（文件）将自己限制为一个循环。文本）;当序列/循环结构在awk之外时，我们发现自己必须为每次通过序列/循环处理输入文件一次（例如，对于本练习，我们必须处理输入文件 21时代!!! ）。

Answer 3

对你实际想要完成的事情进行一些猜测，我想出了这个：

awk '{ for (i=20; 20*$7<=i && i>0; i--) bucket[i]++ }
    END { for (i=1; i<=20; i++) print bucket[i] " lines where $7 <= " i/20 }'

使用来自mark's second answer的模拟数据，我得到了这个输出：

2 lines where $7 <= 0.05
2 lines where $7 <= 0.1
2 lines where $7 <= 0.15
2 lines where $7 <= 0.2
3 lines where $7 <= 0.25
3 lines where $7 <= 0.3
3 lines where $7 <= 0.35
3 lines where $7 <= 0.4
3 lines where $7 <= 0.45
3 lines where $7 <= 0.5
3 lines where $7 <= 0.55
5 lines where $7 <= 0.6
5 lines where $7 <= 0.65
5 lines where $7 <= 0.7
5 lines where $7 <= 0.75
5 lines where $7 <= 0.8
5 lines where $7 <= 0.85
5 lines where $7 <= 0.9
5 lines where $7 <= 0.95
6 lines where $7 <= 1

使用非整数传递for循环到awk

3 个答案: