Question

我正在尝试编写一个awk脚本，在完成任何操作之前告诉用户文件中有多少行。我知道如何在END部分中执行此操作但在BEGIN部分中无法执行此操作。我搜索过SE和Google，但是在END部分或者作为bash脚本的一部分只找到了六种方法，而不是在完成任何处理之前如何做到这一点。我希望得到以下内容：

#!/usr/bin/awk -f

BEGIN{
        print "There are a total of " **TOTAL LINES** " lines in this file.\n"
     }
{

        if($0==4587){print "Found record on line number "NR; exit 0;}
}

但是如果有可能的话，一直无法确定如何做到这一点。感谢。

Answer 1

您可以阅读该文件两次。

awk 'NR!=1 && FNR==1 {print NR-1} <some more code here>' file{,}

在你的例子中：

awk 'NR!=1 && FNR==1 {print "There are a total of "NR-1" lines in this file.\n"} $0==4587 {print "Found record on line number "NR; exit 0;}' file{,}

您可以使用file file代替file{,}（它只是让它出现两次。）
NR!=1 && FNR==1只有在第二个文件的第一行才会出现这种情况。

使用awk脚本

#!/usr/bin/awk -f
NR!=1 && FNR==1 {
    print "There are a total of "NR-1" lines in this file.\n"
    } 
$0==4587 {
    print "Found record on line number "NR; exit 0
    }

awk -f myscript file{,}

Answer 2

要做到这一点并且对于多个文件，您需要以下内容：

$ cat tst.awk
BEGINFILE {
    numLines = 0
    while ( (getline line < FILENAME) > 0 ) {
        numLines++
    }
    print "----\nThere are a total of", numLines, "lines in", FILENAME
}
$0==4587 { print "Found record on line number", FNR, "of", FILENAME; nextfile }
$
$ cat file1
a
4587
c
$
$ cat file2
$
$ cat file3
d
e
f
4587
$
$ awk -f tst.awk file1 file2 file3
----
There are a total of 3 lines in file1
Found record on line number 2 of file1
----
There are a total of 0 lines in file2
----
There are a total of 4 lines in file3
Found record on line number 4 of file3

以上使用GNU awk进行BEGINFILE。任何其他解决方案都难以实现，因此它将处理空文件（您需要一个数组来跟踪正在解析的文件，并在跳过空文件后打印FNR==1和END部分的信息。

使用getline有一些警告，不应轻易使用，请参阅http://awk.info/?tip/getline，但这是它的适当和强大用途之一。您还可以通过测试ERRNO并跳过文件来测试BEGINFILE中的不可读文件（请参阅gawk手册） - 这种情况会导致其他脚本中止。

Answer 3

BEGIN {
s="cat your_file.txt|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

这会将名为your_file.txt的文件的大小放入awk变量file_size中并将其打印出来。

如果您的文件名是动态的，您可以在命令行上传递文件名并更改脚本以使用该变量。

E.g。 my.awk

BEGIN {
s="cat "VAR"|wc -l"; 
s | getline file_size;
close(s);
print file_size 
}

然后你可以像这样调用它： awk -v VAR="your_file.txt" -f my.awk

Answer 4

如果你使用 GNU awk ，需要一个强大的通用解决方案，可以容纳多个，可能是空的输入文件，使用Ed Morton's解决方案。

此答案使用便携式（POSIX兼容）代码。在所提到的限制内，它是健壮的，但Ed的 GNU awk解决方案更简单，更健壮。给Ed Morton的帮助提示。

使用 单个输入文件，{strong>更简单，可以使用{{1}中的 shell 命令处理行计数} block ，具有以下优点：

在调用时，文件名不必指定两次，与accepted answer不同
- 另请注意，接受的答案并非按预期工作（截至撰写本文时）;正确的形式是（请参阅答案的评论以获得解释）：
  - BEGIN
该解决方案也适用于空输入文件。

就性能而言，这种方法要么比在awk 'NR==FNR {next} FNR==1 {print NR-1} $0==4587 {print "Found record on line number "NR; exit 0}' file{,}中读取文件两次稍慢，要么稍快一些，具体取决于所使用的awk实现：

awk

<强>假设：

文件名作为第一个操作数（非选项参数）传递到命令行，作为awk ' BEGIN { # Execute a shell command to count the lines and read # result into an awk variable via <cmd> | getline <varname>. # If the file cannot be read, abort. (The shell has already printed an error msg.) cmd="wc -l < \"" ARGV[1] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd) printf "There are a total of %s lines in this file.\n\n", count } $0==4587 { print "Found record on line number " NR; exit 0 } ' file访问。
文件名不包含嵌入的ARGV[1]字符。

以下解决方案涉及 多个文件，并进行类似的假设：

传递的所有操作数都是文件名。也就是说，程序之后的所有参数都必须是文件名，而不是"等变量赋值。
没有文件名包含嵌入的var=value字符。
如果任何输入文件不存在或无法读取，则不会进行任何处理。

不难将此概括为处理多个文件，但以下解决方案 不会打印空文件的行数：

如果您希望为空文件打印行数，那么事情会变得有点更棘手：

awk '
  BEGIN {
     # Loop over all input files and store their line counts in an array.
    for (i=1; i<ARGC; ++i) {
      cmd="wc -l < \"" ARGV[i] "\""; if ((cmd | getline count) < 1) exit 1; close(cmd)
      counts[ARGV[i]] = count
    }
  }
   # At the beginning of every (non-empty) file, print the line count.
  FNR==1 { printf "There are a total of %s lines in file %s.\n\n", counts[FILENAME], FILENAME }
  # $0==4587 { print "%s: Found record on line number %d\n", FILENAME, NR; exit 0 }
' file1 file2 # ...

使用`awk`打印BEGIN部分文件中的行数

4 个答案: