Question

awk  'BEGIN{OFS=","} FNR == 1
            {if (NR > 1) {print fn,fnr,nl}
                        fn=FILENAME; fnr = 1; nl = 0}
                        {fnr = FNR}
                        /ERROR/ && FILENAME ~ /\.gz$/ {nl++}
                        {
                            cmd="gunzip -cd " FILENAME
                            cmd; close(cmd)
                         }
            END                    {print fn,fnr,nl}
        ' /tmp/appscraps/* > /tmp/test.txt

以上扫描给定目录中的所有文件。打印文件名，每个文件中的行数以及包含'ERROR'的行数。

我现在试图使它成为脚本执行命令，如果它读入的任何文件不是常规文件。即，如果文件是gzip文件，则运行特定命令。

以上是我尝试将gunzip命令包含在那里并自己完成。不幸的是，它不起作用。另外，我不能预先“枪杀”目录中的所有文件。这是因为并非目录中的所有文件都是“gzip”类型。有些将是常规文件。

所以我需要脚本来处理任何.gz文件它找到一个不同的方式，以便它可以读取它，计算和打印其中的行数，以及它找到的与所提供的模式匹配的行数（就像如果文件是常规文件，那就是。）

任何帮助？

Answer 1

我认为它可能比那更简单。

使用shell扩展，您已经拥有了文件名（因此您可以打印它）。因此，您可以对所有文件进行循环，并为每个文件执行以下操作：

打印文件名
zgrep -c ERROR $ file（输出包含'ERROR'的行数）
zcat $ file | wc -l（这将输出行号）

zgrep和zcat适用于纯文本文件和gzip文件。

假设路径/文件名中没有任何空格：

for f in /tmp/appscraps/* 
do
   n_lines=$(zcat "$f"|wc -l)
   n_errors=$(zgrep -c ERROR "$f")
   echo "$f $n_lines $n_errors"
done

这是未经测试但它应该有用。

Answer 2

这部分剧本毫无意义：

        {if (NR > 1) {print fn,fnr,nl}
                    fn=FILENAME; fnr = 1; nl = 0}
                    {fnr = FNR}
                    /ERROR/ && FILENAME ~ /\.gz$/ {nl++}

让我对它进行一些重组并对其进行评论，以便更清楚它的作用：

{ # for every line of every input file, do the following:

    # If this is the 2nd or subsequent line, print the values of these variables:
    if (NR > 1) {
         print fn,fnr,nl
    } 

    fn = FILENAME    # set fn to FILENAME. Since this will occur for the first line of
                     # every file, this is that value fn will have when printed above,
                     # so why not just get rid of fn and print FILENAME?

    fnr = 1          # set fnr to 1. This is immediately over-written below by
                     # setting it to FNR so this is pointless.

    nl = 0

}
{ # for every line of every input file, also do the following
  # (note the unnecessary "}" then "{" above):

    fnr = FNR        # set fnr to FNR. Since this will occur for the first line of
                     # every file, this is that value fnr will have when printed above,
                     # so why not just get rid of fnr and print FNR-1?
} 

/ERROR/ && FILENAME ~ /\.gz$/ {

    nl++             # increment the value of nl. Since nl is always set to zero above,
                     # this will only ever set it to 1, so why not just set it to 1?
                     # I suspect the real intent is to NOT set it to zero above.

}

您还可以使用以上代码测试以“.gz”结尾的文件名，但是您将在下一个块中的每个文件上运行gunzip。

除此之外，就像其他人也建议的那样，只需从shell调用gunzip。 awk是一个解析文本的工具，它不是一个可以调用其他工具的环境 - 这就是shell的用途。

例如，假设您的评论（prints the file name, number of lines in each file and number of lines found containing 'ERROR）准确描述了您希望awk脚本执行的操作，并假设使用awk直接在“.gz”文件中测试单词“ERROR”是有意义的：

for file in /tmp/appscraps/*.gz
do
    awk -v OFS=',' '/ERROR/{nl++} END{print FILENAME, NR+0, nl+0}' "$file"
    gunzip -cd "$file"
done > /tmp/test.txt

更清晰，更简单，不是吗？

如果直接在“.gz”文件中测试单词ERROR没有意义，那么你可以这样做：

for file in /tmp/appscraps/*.gz
do
    zcat "$file" | awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'
    gunzip -cd "$file"
done > /tmp/test.txt

正如您在下面的评论中所描述的那样处理gz和非gz文件：

for file in /tmp/appscraps/*
do
    case $file in
        *.gz ) cmd="zcat" ;;
        * )    cmd="cat"  ;;
    esac

    "$cmd" "$file" |
        awk -v file="$file" -v OFS=',' '/ERROR/{nl++} END{print file, NR+0, nl+0}'

done > /tmp/test.txt

我遗漏了枪口，因为根据你声明的要求我不需要它。如果我错了，请解释一下你需要它。

Answer 3

您可以对每个文件使用执行以下命令：

gunzip -t FILENAME; echo $?

它将通过打印退出代码0（对于gzip文件）或1（损坏/其他文件）。现在，您可以使用IF比较输出以执行所需的处理。

试图修改awk代码

3 个答案: