Question

我需要使用OS X 10.10附带的任何标准CLI工具按日期拆分TSV文件;例如sed，awk等.FYI shell是Bash

输入文件有一个标题行，后面是一个标签分隔格式（日期和时间在第一列） - 我正在添加“\ t”以显示标签，“...”表示行还有更多专栏：

Transaction Date\t Account Number\t…
9/16/2004 12:00:00 AM\t ABC00147223\t…
9/17/2004 12:00:00 AM\t ABC00147223\t…
10/05/2004 12:00:00 AM\t ABC00147223\t…

输出应为：

每个唯一年份和月份的单独文件（基于上面的示例，我将获得2个输出文件：9/2004和10/2004）
维护原始文件的第一行/标题行
文件名格式为YYYYMM.txt

感谢您的帮助。

Answer 1

如果你想在bash shell中做纯粹的操作，请按以下方式执行...

#!/bin/bash

datafile=inputdatafile.dat
ctr=0;

while read line
do

  # counter to keep track of line number
  ctr=$((ctr + 1))

  # skip header line for processing
  if [[ $ctr -gt 1  ]];
  then
      # create filename using date field present in record
      vdate=${line%% *}               
      vday1=${vdate%%/*}              
      vday=`printf "%02d" $vday1`     # day with padding 0
      vyear=${vdate##*/}              # year
      vfilename="${vyear}${vday}.txt" # filname in YYYYMM.txt format

      # check if file exists or not then put header record in it
      if [ ! -f $vfilename ]; then
        head -1 $datafile > $vfilename
      fi

      # put the record in that file
      echo "$line" >> $vfilename
  fi

done < $datafile

不确定您的数据文件有多大，但使用shell脚本解析大型文件并不是一个好主意，而是使用其他工具，如awk，sed，grep等。

对于大文件和使用nawk / gawk单行使用如下...它将完成你所需要的一切。

# use nawk or gawk if you don't get the expected results using awk
$nawk '{if(NR==1)h=$0;} {if(NR>1){ split($1,a,"/"); fn=sprintf("%04d%02d.txt",a[3],a[1]); if(system( "[ ! -f  " fn " ] ")==0)print h >> fn; print >> fn;} }' inputdatafile.dat

按日期拆分文件并在Bash中保留标题

1 个答案: