Bash重新组织CSV文件

时间:2017-05-25 15:57:09

标签: bash parsing awk sed grep

我有一系列包含不同数据元素的CSV文件。它们的结构如下:

datetime,var1,val1,var2,val2,...,varx,valx

不幸的是,在某些情况下没有var1,而在其他情况下,var1将出现在行的后面。

示例CSV(修剪为几行,变量)

11/20/2011 3:05:00 AM,HR,115,ST-V,1.2,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:02 AM,HR,119,ST-II,0.1,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:04 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVR,-0.1,
11/20/2011 3:05:06 AM,HR,123,ST-II,0.1,ST-I,0,ST-V,1.2,ST-III,-0.1,
11/20/2011 3:05:08 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVL,0.1,
11/20/2011 3:05:10 AM,ST-V,1.1,ST-III,-0.4,ST-AVR,0,ST-AVL,0.2,
11/20/2011 3:05:12 AM,PVC,0,ST-II,0,ST-I,0,ST-V,1.1,ST-III,-0.4,
11/20/2011 3:05:14 AM,PVC,0,ST-II,0,ST-I,0,APNEA,0,

最终,我想做以下事情:

  1. 通读文件
  2. 复制每行的日期时间戳
  3. 查找var1
  4. 复制val1
  5. 如果不存在var1,请创建var1,将NaN插入val1
  6. 重复所有变量
  7. 保存到新的csv文件
  8. 所需输出(仅限于两个样本变量,将扩展为包含所有变量):

    11/20/2011 3:05:00 AM,HR,115,PVC,NaN,
    11/20/2011 3:05:02 AM,HR,119,PVC,NaN,
    11/20/2011 3:05:04 AM,HR,122,PVC,NaN,
    11/20/2011 3:05:06 AM,HR,123,PVC,NaN,
    11/20/2011 3:05:08 AM,HR,122,PVC,NaN,
    11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN,
    11/20/2011 3:05:12 AM,HR,NaN,PVC,0,
    11/20/2011 3:05:14 AM,HR,NaN,PVC,0,
    

    到目前为止,我的进展仅限于以下内容:

    cut -d',' -f1   # pulls the datetime nicely
    grep -n -o 'HR,.*' file.csv | cut -f2 -d','    # works on nearly all variables and pulls the variable from the field following the grep term, but skips all empty lines
    

    有关如何进行的任何建议?

1 个答案:

答案 0 :(得分:2)

你的问题非常混乱,但我认为这就是你想要做的事情:

$ cat tst.awk
BEGIN {
    FS=OFS=","
    numTags = split(tags,tagOrder)
    for (tagNr in tagOrder) {
        tagName = tagOrder[tagNr]
        tagSet[tagName]
    }
}
{
    delete tag2val
    for (fldNr=2; fldNr<=NF; fldNr++) {
        if ($fldNr in tagSet) {
            tag2val[$fldNr] = $(fldNr+1)
        }
    }

    printf "%s%s", $1, OFS
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tagName = tagOrder[tagNr]
        printf "%s%s%s%s", tagName, OFS, (tagName in tag2val ? tag2val[tagName] : "NaN"), (tagNr<numTags?OFS:ORS)
    }
}

$ awk -v tags='HR,PVC' -f tst.awk file
11/20/2011 3:05:00 AM,HR,115,PVC,NaN
11/20/2011 3:05:02 AM,HR,119,PVC,NaN
11/20/2011 3:05:04 AM,HR,122,PVC,NaN
11/20/2011 3:05:06 AM,HR,123,PVC,NaN
11/20/2011 3:05:08 AM,HR,122,PVC,NaN
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN
11/20/2011 3:05:12 AM,HR,NaN,PVC,0
11/20/2011 3:05:14 AM,HR,NaN,PVC,0

如果没有,请编辑您的问题以澄清。