我有一个如下文件:
C_DocType_ID,SOReference,DocumentNo,ProductValue,Quantity,LineDescription,C_Tax_ID,TaxAmt
1000000,1904093563U,1904093563U,5210-1,1,0,1000000,0 1000000,1904093563U,1904093563U,6511,2,0,1000000,0 1000000,1904093563U,1904093563U,5001,1,0,1000000,0 1000000,1904083291U,1904083291U,5310,4,0,1000000,0 1000000,1904083291U,1904083291U,5311,3,0,1000000,0 1000000,1904083291U,1904083291U,6101,6,0,1000000,0 1000000,1904083291U,1904083291U,6102,1,0,1000000,0 1000000,1904083291U,1904083291U,6106,6,0,1000000,0
我需要将其转换为如下所示的文本文件:
WOH~1.0~~1904093563Utest~~~ORD~~~~
WOL~~~5210-1~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOL~~~6511~~~~~~~~2~~~~~~~~~~~~~~~~~~~~~
WOL~~~5001~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOH~1.0~~1904083291Utest~~~ORD~~~~~~
WOL~~~5310~~~~~~~~4~~~~~~~~~~~~~~~~~~~~~
WOL~~~5311~~~~~~~~3~~~~~~~~~~~~~~~~~~~~~
WOL~~~6101~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~
WOL~~~6102~~~~~~~~1~~~~~~~~~~~~~~~~~~~~~
WOL~~~6106~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~
输出文件具有标题记录和行项目记录。标头记录包含SOReference和一些硬编码字段,而行项目记录包含与该SOReference关联的产品价值和数量。在输入文件中,我们有2个唯一的SOReferences,这就是为什么输出文件包含2个标题记录及其关联的行项目记录的原因。
是否需要以命令行方式进行某些操作(awk / sed)?因为我有一系列这样的文件,需要将其转换为文本。
答案 0 :(得分:1)
使用AWK,请尝试以下操作:
awk -F, '
FNR==1 {next} # skip the header line
{
if ($2 != prevcol2) { # insert newline when SOReference changes
nl = FNR<=2 ? "" : "\n" # suppress the newline in the 1st line
printf("%sWOH~1.0~~%stest~~~ORD~~~~\n", nl, $2)
}
printf("WOL~~~%s~~~~~~~~%s~~~~~~~~~~~~~~~~~~~~~\n", $4, $5)
prevcol2 = $2
}' file.csv