awk从未格式化的输入改进格式

时间:2014-11-29 04:30:17

标签: unix awk

想知道如何将未格式化的输入改进为格式化的输出。 能够通过以下步骤来做到这一点...... 实际输入文件包含18个字段。

$ cat st_Input.txt

Total No. of Records Displayed: 4
---------------------------------------------------------------------------------------
|   Circle Desc.|Serial From        |Serial To          |        Quantity|Plant Desc.  |
---------------------------------------------------------------------------------------
|   CCCC        |20282783701        |20282788700        |      5,000.000 |2220         |
|   CCCC        |5991421000742062451|5991421000742062477|         27.000 |2310         |
|   CCCC        |41700000906        |41700011005        |     10,100.000 |2210         |
|   CCCC        |5988888000742062478|5988888000742062564|         10.000 |2210         |
----------------------------------------------------------------------------------------
|  *            |                   |                   |      15,724.000|             |
----------------------------------------------------------------------------------------

步骤1:格式化字段分隔符from "|"to ","以避免字段位置更改,例如数量5,000.000 will be changed as 5000.000 instead of 5 and 000.000

命令#1:

awk -F '|' '{ gsub(/,/,""); $1=$1 }1' OFS="," st_Input.txt >Format_st_Input.txt

输出#1:

Total No. of Records Displayed: 4
---------------------------------------------------------------------------------------
,   Circle Desc.,Serial From        ,Serial To          ,        Quantity,Plant Desc.  , 
---------------------------------------------------------------------------------------
,   CCCC        ,20282783701        ,20282788700        ,      5000.000 ,2220         ,
,   CCCC        ,5991421000742062451,5991421000742062477,         27.000 ,2310         ,

步骤2:尝试过下面的提交

IF字段$3~"5991421000"打印到“Op22_st_Input.txt”,$3~"[0-9]"打印到“Op33_st_Input.txt”,所有other junk characters into“Op44_st_Input.txt”

命令#2:

awk -F"," '{OFS=","; if ($3~"5991421000") {print $0,FILENAME > "Op22_st_Input.txt";next} \
else if ($3~"[0-9]"){print $0,FILENAME > "Op33_st_Input.txt";next} \
else {print $0,FILENAME > "Op44_st_Input.txt";next}}' Format_st_Input.txt

是否有最简单的方法可以从Delete $1, $2==$3,$3==$4 then print all剩余的字段项中更改字段位置 而不是键入print $2,$3,$4,... till $18并避免采取许多步骤

#Op22_st_Input.txt

的所需输出
5991421000742062451,5991421000742062477,   CCCC        ,         27.000 ,2310         ,,Format_st_Input.txt

#Op33_st_Input.txt

的所需输出
20282783701        ,20282788700        ,   CCCC        ,      5000.000 ,2220         ,,Format_st_Input.txt
41700000906        ,41700011005        ,   CCCC        ,     10100.000 ,2210         ,,Format_st_Input.txt
5988888000742062478,5988888000742062564,   CCCC        ,         10.000 ,2210         ,,Format_st_Input.txt

#Op44_st_Input.txt

的所需输出
Total No. of Records Displayed: 4,Format_st_Input.txt
---------------------------------------------------------------------------------------,Format_st_Input.txt
,   Circle Desc.,Serial From        ,Serial To          ,        Quantity,Plant Desc.  , ,Format_st_Input.txt
---------------------------------------------------------------------------------------,Format_st_Input.txt
----------------------------------------------------------------------------------------,Format_st_Input.txt
,  *            ,                   ,                   ,      15724.000,             , ,Format_st_Input.txt
----------------------------------------------------------------------------------------,Format_st_Input.txt

2 个答案:

答案 0 :(得分:1)

你可以使用这个awk:

awk 'BEGIN{FS=OFS=","}
    {of=""}
    $3~/[0-9]/{of="Op33_st_Input.txt"} 
    $3~/5991421000/{of="Op22_st_Input.txt"}
    of{s=$2;$2=$3;$3=$4;$4=s;$1=""; print substr($0,2),FILENAME > of; next}
    {print $0, FILENAME > "Op44_st_Input.txt"}' Format_st_Input.txt

说明:

  • BEGIN部分将字段分隔符和输出字段分隔符设置为逗号
  • of变量初始化为每行的空字符串
  • 如果$3与正则表达式[0-9]匹配,则设置为Op33_st_Input.txt
  • 如果$3与正则表达式5991421000匹配,则设置为Op22_st_Input.txt
  • 如果设置了of,则使用格式化输出并重定向到变量of
  • 否则打印行,FILENAME为Op44_st_Input.txt

答案 1 :(得分:0)

别。使用cut

cut -d , -f 11,12,2-10,13-17