我有一系列包含不同数据元素的CSV文件。它们的结构如下:
datetime,var1,val1,var2,val2,...,varx,valx
不幸的是,在某些情况下没有var1,而在其他情况下,var1将出现在行的后面。
示例CSV(修剪为几行,变量)
11/20/2011 3:05:00 AM,HR,115,ST-V,1.2,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:02 AM,HR,119,ST-II,0.1,ST-AVF,-0.1,ST-AVL,0.1,
11/20/2011 3:05:04 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVR,-0.1,
11/20/2011 3:05:06 AM,HR,123,ST-II,0.1,ST-I,0,ST-V,1.2,ST-III,-0.1,
11/20/2011 3:05:08 AM,HR,122,ST-II,0.1,ST-I,0,ST-V,1.2,ST-AVL,0.1,
11/20/2011 3:05:10 AM,ST-V,1.1,ST-III,-0.4,ST-AVR,0,ST-AVL,0.2,
11/20/2011 3:05:12 AM,PVC,0,ST-II,0,ST-I,0,ST-V,1.1,ST-III,-0.4,
11/20/2011 3:05:14 AM,PVC,0,ST-II,0,ST-I,0,APNEA,0,
最终,我想做以下事情:
所需输出(仅限于两个样本变量,将扩展为包含所有变量):
11/20/2011 3:05:00 AM,HR,115,PVC,NaN, 11/20/2011 3:05:02 AM,HR,119,PVC,NaN, 11/20/2011 3:05:04 AM,HR,122,PVC,NaN, 11/20/2011 3:05:06 AM,HR,123,PVC,NaN, 11/20/2011 3:05:08 AM,HR,122,PVC,NaN, 11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN, 11/20/2011 3:05:12 AM,HR,NaN,PVC,0, 11/20/2011 3:05:14 AM,HR,NaN,PVC,0,
到目前为止,我的进展仅限于以下内容:
cut -d',' -f1 # pulls the datetime nicely
grep -n -o 'HR,.*' file.csv | cut -f2 -d',' # works on nearly all variables and pulls the variable from the field following the grep term, but skips all empty lines
有关如何进行的任何建议?
答案 0 :(得分:2)
你的问题非常混乱,但我认为这就是你想要做的事情:
$ cat tst.awk
BEGIN {
FS=OFS=","
numTags = split(tags,tagOrder)
for (tagNr in tagOrder) {
tagName = tagOrder[tagNr]
tagSet[tagName]
}
}
{
delete tag2val
for (fldNr=2; fldNr<=NF; fldNr++) {
if ($fldNr in tagSet) {
tag2val[$fldNr] = $(fldNr+1)
}
}
printf "%s%s", $1, OFS
for (tagNr=1; tagNr<=numTags; tagNr++) {
tagName = tagOrder[tagNr]
printf "%s%s%s%s", tagName, OFS, (tagName in tag2val ? tag2val[tagName] : "NaN"), (tagNr<numTags?OFS:ORS)
}
}
$ awk -v tags='HR,PVC' -f tst.awk file
11/20/2011 3:05:00 AM,HR,115,PVC,NaN
11/20/2011 3:05:02 AM,HR,119,PVC,NaN
11/20/2011 3:05:04 AM,HR,122,PVC,NaN
11/20/2011 3:05:06 AM,HR,123,PVC,NaN
11/20/2011 3:05:08 AM,HR,122,PVC,NaN
11/20/2011 3:05:10 AM,HR,NaN,PVC,NaN
11/20/2011 3:05:12 AM,HR,NaN,PVC,0
11/20/2011 3:05:14 AM,HR,NaN,PVC,0
如果没有,请编辑您的问题以澄清。