将文件拆分为好的和坏的数据

时间:2016-10-25 01:24:45

标签: awk

我有一个文件file1.txt,数据如下所示

HDR|2016-10-24
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME|DNIS_CODE|1
DTL|10000|SRC_ORD_ID|SRC_ORD_TYPE_CD|SRC_ORD_STAT_CD|SRC_ACCT_ID|SRC_DISC_RSN_CD|1858-11-17|1858-11-18|1858-11-19|1858-11-20|1858-11-21|1858-11-22|ORD_STATUS_CD|ORDER_CREA_USER_ID|REGION_NM|STATE_CD|ORDER_TYPE|BILL_NAME|FEED_TYPE_CD|101|CREA_APPLN_NAME|BILL_TELE_NUM|CUST_CD|DIGITAL_LIFE_FLAG|CUSTOMER_TYPE_CD|VENDOR_NAME|SITE_NAME
TRL|11

现在我想创建两组好坏的文件。好的应该是所有29个分离器都存在的地方。如果它小于或大于29分隔符(管道),它应该进入坏文件。

IN_FILE=$1
FNAME=`echo $IN_FILE | cut -d"." -f1 | awk '{$1 = substr($1, 1, 26)} 1'`
DFNAME=$FNAME"_Data.txt"
DGFNAME=$FNAME"_Good.txt"
DBFNAME=$FNAME"_Bad.txt"
TFNAME=$FNAME"_Trl.txt" 

cat $IN_FILE | awk -v DGFNM="$DGFNAME" -v DBFNM="$DBFNAME" '
{ {FS="|"}
    split($0, chars, "|")
    if(chars[1]=="DTL")
    {  
       NSEP=`awk -F\| '{print NF}'`
        if [ "$NSEP" = "29" ]
        then
           print substr($0,5) >> DGFNM
        else
           print $0 >> DBFNM
        fi
    }
}'

但我在这方面遇到了一些错误。

awk: cmd. line:5:    NSEP=`awk -F\| {print
awk: cmd. line:5:         ^ invalid char '`' in expression

2 个答案:

答案 0 :(得分:2)

看起来你想要:

Server Error in '/' Application.

The view 'SELECT 
   [Extent1].[ProductId] AS [ProductId], 
   [Extent1].[Prod_Name] AS [Prod_Name], 
   [Extent1].[Unit_Price] AS [Unit_Price], 
   [Extent1].[ListPrice] AS [ListPrice], 
   [Extent1].[Qty_on_Stock] AS [Qty_on_Stock], 
   [Extent1].[Prod_Description] AS [Prod_Description]
   FROM [dbo].[Products] AS [Extent1]' or its master was not found or no view engine supports the searched locations. The following locations were searched:
~/Views/Product/SELECT 
   [Extent1].[ProductId] AS [ProductId], 
   [Extent1].[Prod_Name] AS [Prod_Name], 
   [Extent1].[Unit_Price] AS [Unit_Price], `enter code here
   [Extent1].[ListPrice] AS [ListPrice], 
   [Extent1].[Qty_on_Stock] AS [Qty_on_Stock], 
   [Extent1].[Prod_Description] AS [Prod_Description]
   FROM [dbo].[Products] AS [Extent1].aspx

您的代码有两个主要问题:

  • 它在awk -F'|' -v DGFNM="$DGFNAME" -v DBFNM="$DBFNAME" ' $1 == "DTL" { if (NF == 29) { print substr($0, 5) > DGFNM } else { print > DBFNM } } ' "$IN_FILE" 脚本中使用shell语法(例如`....`[ ... ]),但不支持。

  • 它默认执行awk隐式执行的操作。

此外:

  • 最好避免使用全大写变量名 - 包括shell和awk脚本 - 因为它们可能与保留变量冲突。

  • 正如@tripleee在评论中指出的那样,你可以直接将文件名传递给Awk(如上面的代码所示) - 不需要awk和一个pipelin。

答案 1 :(得分:1)

本质上:

$ awk -F\| 'NF==30 {print > "good.txt"; next}{print > "bad.txt"}' file1.txt

29个分隔符表示30个字段,只需检查NF