对csv文件进行排序,使用bash添加字段

时间:2016-04-14 21:22:55

标签: bash csv

我有一堆需要“排序”的csv文件。具体来说,我需要对第2列进行排序,并在第一列中添加一个新字段,根据一个公式包含一些值。我想做什么,从file_1开始获取下面的文件结果,提前谢谢。

file_1:

2222,21,44444,55555  
2223,24,66666,33333  
2222,23,77777,99999  

我希望根据file_1排序field 2,使用以下公式在result_file的开头添加一个新列(成为第一个字段):new field = 4000 - first field of file_1,输出应该是两个文件( result_file如下所示,report_file报告缺少的记录,如下所示):

result_file:

1778,2222,21,44444,55555  
1778,2222,23,77777,99999  
1777,2223,24,66666,33333  

report_file 根据result_file的字段3:

**Error missing record 22 for the range 2222  
**Total records for the range 1778 is 2  
**Total records for the range 1777 is 1  

如果没有遗漏,我们应该直接进入报告:

**No missing records  
**Total records for the range 1778 is 2  
**Total records for the range 1777 is 1  

谢谢

1 个答案:

答案 0 :(得分:0)

我发现在生成 report_file 期间遇到了(有)情况,这些情况未在问题中指定,并且有些任意选择是(是)制作的。

也许您应该尝试提供包含20行或更多行的 file_1 以及相关的 report_file 示例?

尝试下面的awk脚本:

#!/usr/bin/awk -f
BEGIN{
  FS=",";i
  OFS=",";
  sortedsize=0;
}
{
  newfields[$2]=4000-$1;
  lines[$2]=newfields[$2] ","  $0;
  count[newfields[$2]]++;
  if (sortedsize==0) {
    sorted[++sortedsize]=$2;
  } else {
    for (i=1;i<=sortedsize;i++) {
      if (sorted[i]>$2) {
        previous=sorted[i];
        sorted[i]=$2;
        for(j=i+1;j<=sortedsize;j++){
          print "1.2 " j;
          save=sorted[j];
          sorted[j]=previous;
          previous=save;
        }
        sorted[++sortedsize]=previous;
        break;
      }
    }
    if (i>sortedsize) {
      sorted[++sortedsize]=$2;
    }
  }
}
END {
  missingrecords=false;
  for (i=1;i<=sortedsize;i++) {
    print lines[sorted[i]] > "result_file";
    if (i>1) {
      if (sorted[i] != (sorted[i-1] + 1) && newfields[sorted[i]] == newfields[sorted[i-1]]) {
        missingrecords=true;
        print "**Error missing record " (sorted[i-1] + 1) " for the range " newfields[sorted[i-1]] > "report_file";
      } else if (newfields[sorted[i]] != newfields[sorted[i-1]]) {
        if (!missingrecords) {
          print "**No missing records" > "report_file";
        }
        print "**Total records for the range " newfields[sorted[i-1]] " is " count[newfields[sorted[i-1]]] > "report_file";
        missingrecords=false;
      }
    }
  }
  i--;
  if (sorted[i] == (sorted[i-1] + 1) && newfields[sorted[i]] == newfields[sorted[i-1]] ) {
    if (!missingrecords) {
          print "**No missing records" > "report_file";
    }
    print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
  } else if (missingrecords) {
    print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
  } else if (newfields[sorted[i]] != newfields[sorted[i-1]]) {
    print "**No missing records" > "report_file";
    print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
  }
}

测试:

$ ls
file_1  script.awk
$ chmod +x script.awk
$ ./script.awk  file_1
$ ls
file_1  report_file  result_file  script.awk