我只需要组合一大堆文件并从第一个文件中删除标题(第1行)。
以下是其中三个文件的最后三行(第1行:标题):
"START_DATE","END_DATE","UNITS","COST","COST_CURRENCY","AMOUNT"
"20170101","20170131","1","5.49","EUR","5.49"
"20170101","20170131","1","4.27","EUR","4.27"
"","","","","9.76",""
"START_DATE","END_DATE","UNITS","COST","COST_CURRENCY","AMOUNT"
"20170201","20170228","1","5.49","EUR","5.49"
"20170201","20170228","1","4.88","EUR","4.88"
"20170201","20170228","1","0.61","EUR","0.61"
"20170201","20170228","1","0.61","EUR","0.61"
"","","","","11.59",""
START_DATE","END_DATE","UNITS","COST","COST_CURRENCY","AMOUNT"
"20170301","20170331","1","4.88","EUR","4.88"
"20170301","20170331","1","4.27","EUR","4.27"
"","","","","9.15",""
正如您所看到的,最后一行在第5列中有一个数字(它是一列总数)。当然,我不想要最后一行。但它(显然)在每个文件中的不同行号上。
(G)awk显然是解决方案,但我不知道(g)awk。
我已经尝试了很多组合,但我想我最惊讶的是不的工作是:
gawk '
{ if (!$1 ) nextfile }
NR == 1 {$0 = "Filename" "StartDate" OFS $0; print}
FNR > 1 {$0 = FILENAME StartDate OFS $0; print}
' OFS=',' */*.csv > ../path/file.csv
"START_DATE","END_DATE","UNITS","COST","COST_CURRENCY","AMOUNT
20170101","20170131","1","5.49","EUR","5.49
20170101","20170131","1","4.27","EUR","4.27
20170201","20170228","1","5.49","EUR","5.49
20170201","20170228","1","4.88","EUR","4.88
20170201","20170228","1","0.61","EUR","0.61
20170201","20170228","1","0.61","EUR","0.61
20170301","20170331","1","4.88","EUR","4.88
20170301","20170331","1","4.27","EUR","4.27"
当然,我已经尝试过搜索Google和SO。我看到的大部分答案都需要比我更多的知识,只是为了理解它们。 (我不是数据争夺者,但我有数据争论任务。)
感谢您的帮助!
答案 0 :(得分:2)
这应该做......
awk 'NR==1; FNR==1{next} FNR>2{print p} {p=$0}' file{1..3}
打印第一个标题,跳过其他标题和最后一行。
答案 1 :(得分:1)
以下内容应该可以解决问题:
awk -F"," 'NR==1{header=$0; print $0} $0!=header && $1!=""{print $0}' */*.csv > ../path/file.csv\
这里awk会:
-F","
header
设置为该行的整个内容,然后打印标题NR==1{header=$0; print $0}
$0!=header && $1!=""{print $0}'
正如我在下面的评论中所提到的,如果您的记录的第一个字段始终以8位数日期开头,那么您可以简化(这不像上面的代码那样通用):
awk -F"," 'NR == 1 || $1 ~ /"[0-9]{8}"/ {print $0} /*.csv > outfile.csv
基本上,如果这是第一个要处理的记录然后打印它(它是标题)或||
如果第一个字段是由双引号括起的8位数字然后打印它。
答案 2 :(得分:1)
另一种 awk 方法: -
awk -F, '
NR == 1 {
header = $0
print
next
}
FNR > 1 && $1 != "\"\""
' *.csv