AWK - 按特定列中的值拆分文件

时间:2013-07-28 08:38:50

标签: regex awk split

我有以下AWK脚本(由Armali在此站点上提供),它基本上按日期(月/年)剥离制表符分隔文件并将其保存为yyyymmm。我现在有另一个附加条件来分割文件。它应按月/年拆分,也可按第3列中的唯一值拆分。将文件另存为yyyymmm_Col3Uniquevalue。

当前脚本是

awk "NR>1{split($2,date,\"/\");print>date[3]strftime(\"%%b.txt\",(date[2]-1)*31*24*60*60)}" input.txt 

数据格式:

Country Date    Type
HongKong    31/01/2012  Television
Japan   14/01/2012  Press
Japan   05/01/2012  Television
Japan   16/02/2013  Press
Japan   15/02/2013  Television

输出将是4个txt文件:

2012Jan_Press - Containing record 2
2012Jan_Television - Containing record 1,3
2013Feb_Press - Containing record 4
2013Feb_Television - Containing record 5

2 个答案:

答案 0 :(得分:3)

稍微玩一下以确保你理解它:

$ cat file
Country Date    Type
HongKong    31/01/2012  Television
Japan   14/01/2012  Press
Japan   05/01/2012  Television
Japan   16/02/2013  Press
Japan   15/02/2013  Television

$ cat tst.awk
NR>1 {
   split($2,a,"/")
   secs = mktime(a[3]" "a[2]" "a[1]" 0 0 0")
   mth  = strftime("%b", secs)
   file = a[3] mth "_" $3
   print file
}

$ awk -f tst.awk file
2012Jan_Television
2012Jan_Press
2012Jan_Television
2013Feb_Press
2013Feb_Television

在GNU awk手册中查找mktime()strftime()

完成测试后,只需将print file更改为print > file即可。

答案 1 :(得分:0)

使用TAB分隔字段......:

awk -F\t "NR>1{split($2,date,\"/\");print>date[3]strftime(\"%%b_\"$3\".txt\",(date[2]-1)*31*24*60*60)}" input.txt
必须从引用的格式字符串中排除

$3

如果日期字段$2包含空格也是时间,则按空格和“/”分隔,以便将年份保持在date[3]

awk -F\t "NR>1{split($2,date,\"[/ ]\");print>date[3]strftime(\"%%b_\"$3\".txt\",(date[2]-1)*31*24*60*60)}" input.txt