bash添加两个逗号以保留csv

时间:2017-12-20 23:47:10

标签: csv awk sed tr

我需要将其加载到CSV文件中。并希望通过在" -----"的末尾添加两个逗号来保留电子表格中的格式。而不是一个。

CASPER_CD_UNIAPP1_NETPROBE_PS      -----                07/12/2017 01:54:31  OI 45976571/4 -655
CASPER_CD_REFD_RESTRICTED_SYM_PS   -----                -----                OI 0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS     12/12/2017 04:01:22  12/12/2017 04:01:22  OI 88970489/1 0
CASPER_CD_EOD_S3FTP_PS             07/20/2017 22:30:45  07/20/2017 22:32:27  OI 71030819/1 0
CASPER_CD_RPTS_SEND_PANAGORA_PS    11/28/2017 16:47:20  11/28/2017 16:47:22  OI 87295557/1 0
CASPER_BD_USDM_MAAS_PS             06/06/2016 21:00:39  06/06/2016 21:07:24  OI 24884239/1 1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS   06/06/2016 21:40:50  06/06/2016 21:45:57  OI 24884239/2 1

我一直在使用这个 - 班轮:

$ grep _PS totalAutosysjobs.20171219 | grep OI | awk '{ print $1", " $2 ", " $3 ", " $4 ", " $5 ", "$6 ", "$7 ", " $8 } ' | tr "\-\-\-\-\-\," "\-\-\-\-\-\, \,"

通常有效的tr,

 | tr "-----," "-----, ,"
tr: unrecognized option `-----,'
 | tr  "'-----,'" "'-----, ,'"

转出tr的内容也不起作用,所以我用了

 sed -e s/'-----,'/'-----, ,'/g

有没有办法,在awk或纯粹的bash中添加两个逗号到#34; -----"在脚本内执行此操作,而不是执行第一个命令的结果?

喜欢:

    "if field $1 =~ "-----" please add two commas to end of "-----, ,"
     else " just add one comma". 

2 个答案:

答案 0 :(得分:3)

使用awk

尝试:

awk '{$1=$1; gsub(/-----/,"-----,")} 1' OFS=, inputfile

实施例

$ awk '{$1=$1; gsub(/-----/,"-----,")} 1' OFS=, inputfile
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1

如何运作

  1. $1=$1

    这个傻瓜认为每一条线都已经改变了。因此,awk会将新的输出字段分隔符应用于每一行。

  2. gsub(/-----/,"-----,")

    在每个-----

  3. 外观的末尾添加逗号
  4. 1

    打印该行。

  5. OFS=,

    使用逗号作为输出字段分隔符。

  6. 替代awk

    根据karakfa的建议,上述内容的变体是:

    awk '{gsub(/-----/,"&"OFS); $1=$1} 1' OFS=, inputfile
    

    此处"&"表示捕获的文本,"&"OFS表示捕获的文本后跟输出字段分隔符。

    使用sed

    $ sed -E 's/-----/&,/g; s/[[:space:]]+/,/g' inputfile
    CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
    CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
    CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
    CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
    CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
    CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
    CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
    

    使用bash

    $ while read -a line; do (IFS=,; printf "%s\n" "${line[*]//-----/-----,}"); done <inputfile
    CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
    CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
    CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
    CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
    CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
    CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
    CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
    

    使用python

    考虑这个python脚本:

    #!/usr/bin/python3
    with open('inputfile') as fhandle:
        for line in fhandle:
            print(','.join(word for word in line.replace("-----","-----,").split()))
    

    将其应用于我们的输入数据:

    $ python3 a.py
    CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
    CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
    CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
    CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
    CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
    CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
    CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
    

    进一步阅读awk

    对awk的一个很好的介绍是过时但写得很好的Grymoire tutorial。还有an awk wikibook。 GNU awk功能的权威指南是GNU awk manual。有关高级研究,请参阅Effective AWK Programming A User’s Guide for GNU Awk by Arnold D. Robbins (PDF)

答案 1 :(得分:1)

您认为输入错误 - 它不是空格分隔的字段,而是固定宽度字段。

使用GNU awk显式处理固定宽度字段:

$ cat tst.awk
BEGIN { FIELDWIDTHS="35 11 10 11 10 3 11 999"; OFS="," }
{ $1=$1; gsub(/ /,""); print }

$ awk -f tst.awk file
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1

无论您的任何数据的价值如何,上述内容都会有效(因此您无需测试-----或任何其他显式值。)