格式化和转换日期和时间

时间:2016-11-01 03:46:18

标签: bash unix awk gawk

我有一个非常大(13 GiB)的csv文件(3856321行和1698),正如预期的那样,某些日期的格式不同。该文件看起来像::

2013/01/08 2:11:30 AM,abdc,good time ...
2015/12/28 8:19:30 PM,abdc,good time ...
2/15/2016 10:46:30 AM,kdafh,almost as good ...
12/13/2014 10:46:00 PM,asjhdk,not that good ...
02-Jan-2014,bad time,good time ...
1/1/2015,nomiss time,boy ...
10/15/2016 17:08:30,bad,boy ...

我想将其转换为相同的时间格式,所需的输出是::

1/8/2013 2:11:30,abdc,good time
12/28/2015 20:19:30,abdc,good time
2/15/2016 10:46:30,kdafh,almost as good
12/13/2014 22:46:00,asjhdk,not that good
1/2/2014 00:00:00,bad time,good time
1/1/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy

我设法使用以下脚本格式化时间

 awk -F ',' 'BEGIN{FS=OFS=","}{split($1,a," "); 
 if(a[3]=="PM") 
 {  split(a[2],b,":"); 
    b[1]=b[1]+12    
    a[2]=b[1]":"b[2]":"b[3]
 };
 if(a[2]=="")
 {
        a[2]="00:00:00"
 }
tmp=a[1];
# tmp2=system("date -d `tmp` +%m/%d/%Y");
# print tmp2
$1=tmp" "a[2]
 }1' time_input.csv

我借用了从问题https://unix.stackexchange.com/questions/177888/how-to-convert-date-format-in-file格式化日期的想法 这是在倒数第二行注释掉的。但是,这在我的情况下不起作用。我收到错误

date: invalid date ‘+%m/%d/%Y’

有更简单,更好的方法吗?提前致谢

3 个答案:

答案 0 :(得分:1)

Awk肯定是一种很好的方式,但是因为它真的是凌晨在这里我不想考虑所有那些$('#numeric').bind('keypress', function (event) { var regex = new RegExp("^[1-9\b]+$"); var key = String.fromCharCode(!event.charCode ? event.which : event.charCode); if (!regex.test(key)) { event.preventDefault(); return false; } }); 所以这里有一个在php中,因为它& #39; s有一个非常好的if函数:

strtotime

运行它:

$ cat program.php
<?php
  $handle = fopen("file", "r");
  if ($handle) {
    while (($line = fgets($handle)) !== false) {
      // process the line read.

      $arr = explode(",", $line, 2);                     
      echo date("m/d/Y H:i:s", strtotime($arr[0])), ",", $arr[1];

    }
    fclose($handle);
  } else {
  // error opening the file.
}

逐行读取来自此处:How to read a file line by line in php。我只添加了$ php -f program.php 01/08/2013 02:11:30,abdc,good time 12/28/2015 20:19:30,abdc,good time 02/15/2016 10:46:30,kdafh,almost as good 12/13/2014 22:46:00,asjhdk,not that good 01/02/2014 00:00:00,bad time,good time 01/01/2015 00:00:00,nomiss time,boy 10/15/2016 17:08:30,bad,boy explode的行。

strtotime将第一行explode拆分为多条,并将它们存储到数组,$arr函数应用于第一个元素strtotime$arr[0]后来按原样输出。

答案 1 :(得分:1)

使用Python,使用dateutilscsv模块:

import dateutil.parser as parser
import csv

with open('time_input.csv', 'rb') as inputfile, open('time_output.csv', 'w') as outputfile:

  reader = csv.reader(inputfile, delimiter=',')
  writer = csv.writer(outputfile)

  for row in reader:
    row[0] = parser.parse(row[0]).strftime('%m/%d/%Y %H:%M:%S')
    writer.writerow(row)

结果输出到time_output.csv文件。

答案 2 :(得分:1)

你可以尝试下面的awk命令 -

  

输入

vipin@kali:~$ cat kk.txt
2013/01/08 2:11:30 AM,abdc,good time
2015/12/28 8:19:30 PM,abdc,good time
2/15/2016 10:46:30 AM,kdafh,almost as good
12/13/2014 10:46:00 PM,asjhdk,not that good
02-Jan-2014,bad time,good time
1/1/2015,nomiss time,boy
10/15/2016 17:08:30,bad,boy
  

过滤 -

vipin@kali:~$  awk -F"," '{split($1,a," "); printf ("%s,%s,%s",$2,$3,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}'  kk.txt
abdc,good time,,01/08/2013 02:11:30
abdc,good time,,12/28/2015 08:19:30
kdafh,almost as good,,02/15/2016 10:46:30
asjhdk,not that good,,12/13/2014 10:46:00
bad time,good time,,01/02/2014 00:00:00
nomiss time,boy,,01/01/2015 00:00:00
bad,boy,,10/15/2016 17:08:30
  

将过滤后的输出移至文件kk.txt2

vipin@kali:~$  awk -F"," '{split($1,a," "); printf ("%s,%s,%s",$2,$3,",");system("date -d \""a[1]" "a[2]"\" +\"%m/%d/%Y %H:%M:%S\"")}'  kk.txt > kk.txt2
  

输出

vipin@kali:~$ awk -F"," '{print $NF,$1,$2}' OFS="," kk.txt2
01/08/2013 02:11:30,abdc,good time
12/28/2015 08:19:30,abdc,good time
02/15/2016 10:46:30,kdafh,almost as good
12/13/2014 10:46:00,asjhdk,not that good
01/02/2014 00:00:00,bad time,good time
01/01/2015 00:00:00,nomiss time,boy
10/15/2016 17:08:30,bad,boy
  

说明 -

在第1列上使用Split函数并将其放入a中,然后使用awk的system函数根据需要格式化日期。

我可以按顺序打印输出,但是它打印了一个前导零,所以我在最后一列打印格式化日期,这就是我在另一个文件中移动数据的原因。 最后,您可以在订单中打印该列。