我有一堆需要“排序”的csv文件。具体来说,我需要对第2列进行排序,并在第一列中添加一个新字段,根据一个公式包含一些值。我想做什么,从file_1
开始获取下面的文件结果,提前谢谢。
file_1:
2222,21,44444,55555
2223,24,66666,33333
2222,23,77777,99999
我希望根据file_1
排序field 2
,使用以下公式在result_file的开头添加一个新列(成为第一个字段):new field = 4000 - first field of file_1
,输出应该是两个文件( result_file如下所示,report_file报告缺少的记录,如下所示):
result_file:
1778,2222,21,44444,55555
1778,2222,23,77777,99999
1777,2223,24,66666,33333
report_file 根据result_file的字段3:
**Error missing record 22 for the range 2222
**Total records for the range 1778 is 2
**Total records for the range 1777 is 1
如果没有遗漏,我们应该直接进入报告:
**No missing records
**Total records for the range 1778 is 2
**Total records for the range 1777 is 1
谢谢
答案 0 :(得分:0)
我发现在生成 report_file 期间遇到了(有)情况,这些情况未在问题中指定,并且有些任意选择是(是)制作的。
也许您应该尝试提供包含20行或更多行的 file_1 以及相关的 report_file 示例?
尝试下面的awk脚本:
#!/usr/bin/awk -f
BEGIN{
FS=",";i
OFS=",";
sortedsize=0;
}
{
newfields[$2]=4000-$1;
lines[$2]=newfields[$2] "," $0;
count[newfields[$2]]++;
if (sortedsize==0) {
sorted[++sortedsize]=$2;
} else {
for (i=1;i<=sortedsize;i++) {
if (sorted[i]>$2) {
previous=sorted[i];
sorted[i]=$2;
for(j=i+1;j<=sortedsize;j++){
print "1.2 " j;
save=sorted[j];
sorted[j]=previous;
previous=save;
}
sorted[++sortedsize]=previous;
break;
}
}
if (i>sortedsize) {
sorted[++sortedsize]=$2;
}
}
}
END {
missingrecords=false;
for (i=1;i<=sortedsize;i++) {
print lines[sorted[i]] > "result_file";
if (i>1) {
if (sorted[i] != (sorted[i-1] + 1) && newfields[sorted[i]] == newfields[sorted[i-1]]) {
missingrecords=true;
print "**Error missing record " (sorted[i-1] + 1) " for the range " newfields[sorted[i-1]] > "report_file";
} else if (newfields[sorted[i]] != newfields[sorted[i-1]]) {
if (!missingrecords) {
print "**No missing records" > "report_file";
}
print "**Total records for the range " newfields[sorted[i-1]] " is " count[newfields[sorted[i-1]]] > "report_file";
missingrecords=false;
}
}
}
i--;
if (sorted[i] == (sorted[i-1] + 1) && newfields[sorted[i]] == newfields[sorted[i-1]] ) {
if (!missingrecords) {
print "**No missing records" > "report_file";
}
print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
} else if (missingrecords) {
print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
} else if (newfields[sorted[i]] != newfields[sorted[i-1]]) {
print "**No missing records" > "report_file";
print "**Total records for the range " newfields[sorted[i]] " is " count[newfields[sorted[i]]] > "report_file";
}
}
测试:
$ ls
file_1 script.awk
$ chmod +x script.awk
$ ./script.awk file_1
$ ls
file_1 report_file result_file script.awk