Question

我有这个awk脚本，它运行一个文件并计算给定日期的每次出现。原始文件中的日期格式是标准日期格式，如下所示：

Thu Mar 5 16:46:15 EST 2009

我使用awk丢弃工作日，时间和时区，然后通过将日期抽入日期来计算以日期为索引的关联数组。

为了使输出按日期排序，我将日期转换为我可以使用bash排序排序的其他格式。

现在，我的输出如下：

Date    Count
03/05/2009   2
03/06/2009   1
05/13/2009   7
05/22/2009  14
05/23/2009   7
05/25/2009   7
05/29/2009  11
06/02/2009  12
06/03/2009  16

我真的希望输出具有更多人类可读日期，例如：

Mar  5, 2009
Mar  6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun  2, 2009
Jun  3, 2009

对于我可以做到这一点的方式的任何建议？如果我能够在输出最佳的计数值时动态执行此操作。

更新：这是我的解决方案，包含ghostdog74的示例代码：

grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
  {++total[$0]} #pump dates into associative array
  END { 
    for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
  }' | sort -t \t | awk ' #send to sort, then to cleanup
  BEGIN {printf "%s\t%s\r\n","Date","Count"}
  {t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
   printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
  }'
rm dates.txt

对不起，这看起来很乱。我试图把澄清的评论放进去。

Answer 1

当我看到有人在管道中使用grep和awk（以及sed，cut，...）时，我感到很暴躁。 Awk可以完全处理许多实用程序的工作。

这是一种清理更新代码以在单个awk实例（嗯，gawk）中运行，并使用sort作为协同进程的方法：

gawk '
    BEGIN {
        IGNORECASE = 1
    }
    function mon2num(mon) {
        return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
    }
    / E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
        month=$2
        day=$3
        year=$6
        date=sprintf("%4d%02d%02d", year, mon2num(month), day)
        total[date]++
        human[date] = sprintf("%3s %2d, %4d", month, day, year)
    }
    END {
        sort_coprocess = "sort"
        for (date in total) {
            print date |& sort_coprocess
        }
        close(sort_coprocess, "to")
        print "Date\tCount"
        while ((sort_coprocess |& getline date) > 0) {
            print human[date] "\t" total[date]
        }
        close(sort_coprocess)
    }
' original.txt

Answer 2

使用awk的sort和date的stdin来大大简化脚本

日期将接受来自stdin的输入，因此您可以将一个管道消除为awk和临时文件。您还可以使用sort的数组排序来消除awk的管道，从而消除另一个管道到awk。此外，不需要协同处理。

此脚本使用date进行月份名称转换，这可能会继续使用其他语言（但忽略时区和月/日订单问题）。

最终结果看起来像“grep | date | awk”。为了便于阅读，我把它分成了不同的行（如果评论被删除，它将大约一半）：

grep -i "E[DS]T 2009" original.txt | 
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk ' 
BEGIN { printf "%s\t%s\r\n","Date","Count" }

{ ++total[$0] #pump dates into associative array }

END {
    idx=1
    for (item in total) {
        d[idx]=item;idx++ # copy the array indices into the contents of a new array
    }
    c=asort(d) # sort the contents of the copy
    for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
        printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
    }
}'

Answer 3

如果你正在使用gawk

awk 'BEGIN{
    s="03/05/2009"
    m=split(s,date,"/")
    t=date[3]" "date[2]" "date[1]" 0 0 0"
    print strftime("%b %d",mktime(t))
}'

以上只是一个示例，因为您没有显示您的实际代码，因此无法将其合并到您的代码中。

Answer 4

为什么不将你的awk-date添加到原始日期？这产生了一个可排序的密钥，但是人类可读。

（注意：要正确排序，你应该让它yyyymmdd）

如果需要，切割可以移除前置柱。

Answer 5

Gawk有strftime（）。您也可以调用date命令对其进行格式化（man）。 Linux Forums给出了一些例子。

用awk人性化约会？

5 个答案:

使用awk的sort和date的stdin来大大简化脚本