我有数百个csv文件,日期格式为%d /%m /%y%H:%M:%S但我想将它们更改为格式%Y-%m-%d%H:% L:%S
INPUT_FILE.csv (date format == %d/%m/%y %H:%M:%S )
13/05/87 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
14/05/87 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
15/05/87 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
16/05/87 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
17/05/87 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
......
......
......
24/02/09 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
25/02/09 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
26/02/09 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
27/02/09 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
28/02/09 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
REQUIRED_OUTPUT.csv (date format == %Y-%m-%d %H:%M:%S )
1987-05-13 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
1987-05-14 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
1987-05-15 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
1987-05-16 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
1987-05-17 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
......
......
......
2009-02-24 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
2009-02-25 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
2009-02-26 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
2009-02-27 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
2009-02-28 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326
我尝试了几种AWK变种,但我无法让它发挥作用。任何帮助
更新:我的错误,我应该提到日期或年份从 1981年到2016年
这是我到目前为止所尝试的内容:
awk -F, '{ gsub("/","-"); split($1, f, " "); print > ("my_data_" f[1]"v" ".csv")}' INPUT_FILE.csv
我将文件拆分为例如
my_data_13-05-87v.csv
my_data_14-05-87v.csv
my_data_15-05-87v.csv
文件内容如下
# for my_data_13-05-87v.csv
13-05-87 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
# for my_data_14-05-87v.csv
14-05-87 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
# for my_data_15-05-87v.csv
15-05-87 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
注意:年份从 1981年到2016年
我想将文件拆分为例如
my_data_1987-05-13v.csv
my_data_1987-05-13v.csv
my_data_1987-05-13v.csv
文件内容如下
# for my_data_1987-05-13v.csv
1987-05-13 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
# for my_data_1987-05-14v.csv
1987-05-14 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
# for my_data_1987-05-15v.csv
1987-05-15 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
第二格式问题: 我也有不同的格式
INPUT_FILE.csv (date format == %d.%m.%y %H:%M:%S )
13.05.1987 4:00:00.000,1.27470,1.27530,1.27460,1.27480,101926,356
14.05.1987 4:01:00.000,1.27490,1.27520,1.27310,1.27490,102419,493
15.05.1987 4:02:00.000,1.27490,1.27540,1.27440,1.27530,102846,427
16.05.1987 4:03:00.000,1.27520,1.27570,1.27490,1.27550,103202,356
17.05.1987 4:04:00.000,1.27550,1.27640,1.27510,1.27590,103528,326
REQUIRED_OUTPUT.csv (date format == %Y-%m-%d %H:%M:%S )
1987-05-13 4:00:00.000,1.27470,1.27530,1.27460,1.27480,101926,356
1987-05-14 4:01:00.000,1.27490,1.27520,1.27310,1.27490,102419,493
1987-05-15 4:02:00.000,1.27490,1.27540,1.27440,1.27530,102846,427
1987-05-16 4:03:00.000,1.27520,1.27570,1.27490,1.27550,103202,356
1987-05-17 4:04:00.000,1.27550,1.27640,1.27510,1.27590,103528,326
答案 0 :(得分:1)
您只需将输入字段分隔符重新定义为/
和,然后重新排序前三个字段。此外,如果年份字段的值为
>16
,则假定为20世纪,否则为21。在此过程中,它会将行写入按日期命名的文件:
$ cat script.awk
{
print ($3>16?"19":"20") $3 "-" $2 "-" $1, $4 > my_data_$1"-"$2"-"$3".csv"
}
运行它:
$ awk -F'[/ ]' -f script.awk INPUT_FILE.csv
答案 1 :(得分:0)
一种天真的方法是使用substr
:
$ awk '{ $1 = sprintf("20%s-%s-%s",
substr($1, 7, 2),
substr($1, 4, 2),
substr($1, 1, 2))
} 1' input.csv
2009-02-24 4:00:00,1.27470,1.27530,1.27460,1.27480,101926,356
2009-02-25 4:01:00,1.27490,1.27520,1.27310,1.27490,102419,493
2009-02-26 4:02:00,1.27490,1.27540,1.27440,1.27530,102846,427
2009-02-27 4:03:00,1.27520,1.27570,1.27490,1.27550,103202,356
2009-02-28 4:04:00,1.27550,1.27640,1.27510,1.27590,103528,326