我有一个文本文件,其中包含PERMNO确定的各种股票的每日数据。 因此,文本文件如下所示:
docker rm
我要做的是提取所有PERMNO,直到市场数据并将其合并,并通过在其余数据旁边添加新列PERMNO将其与其余市场数据一起显示。所以它应该看起来像这样:
PERMNO = 1234
PERMNO = 2134
Market data:
Date | Price | Return | Volume
--------------------------------
2019-01-01| 120 | 100 | 100
PERMNO = 3456
Market data:
Date | Price | Return | Volume
--------------------------------
2019-01-01| 200 | 150 | 130
我正在尝试为此使用awk。我可以以此提取PERMNO,但是我无法通过添加新列将其与其他市场数据结合起来。像sed这样的awk的任何替代方法也可以。但是我仍然对Shell脚本不熟悉,因此我不知道它们的全部功能。谁能建议我该如何解决这个问题?
答案 0 :(得分:2)
您可以通过以下方式获得所需的东西:
$ cat tst.awk
BEGIN { OFS=" | " }
/^PERMNO/ {
permnos = ( permnos == "" ? "" : permnos ",") $NF
}
/^ +[[:alpha:]]/ && !doneHdr++ {
indent = text = $0
sub(/[^ ].*/,"",indent)
sub(/^ +/,"",text)
hdr = text OFS "PERMNO"
sep = sprintf("%*s",length(hdr)+2,"")
gsub(/ /,"-",sep)
print "Market data:" ORS ORS indent hdr ORS indent sep
}
/^ +[0-9]/ {
print $0, permnos
permnos = ""
}
$ awk -f tst.awk file
Market data:
Date | Price | Return | Volume | PERMNO
-----------------------------------------
2019-01-01| 120 | 100 | 100 | 1234,2134
2019-01-01| 200 | 150 | 130 | 3456
但是我强烈建议您只生成一个CSV来代替,以便于进一步的分析/操作:
$ cat tst.awk
BEGIN { FS="[ |]+"; OFS="," }
/^PERMNO/ {
permnos = ( permnos == "" ? "" : permnos " ") $NF
}
sub(/^ +/,"") {
$1 = $1
if ( /^[[:alpha:]]/ && !doneHdr++ ) {
print $0, "PERMNO"
}
else if ( /^[0-9]/ ) {
print $0, permnos
permnos = ""
}
}
$ awk -f tst.awk file
Date,Price,Return,Volume,PERMNO
2019-01-01,120,100,100,1234 2134
2019-01-01,200,150,130,3456
如果您喜欢使用各种工具(例如,与column
:
$ awk -f tst.awk file | column -s, -o' | ' -t
Date | Price | Return | Volume | PERMNO
2019-01-01 | 120 | 100 | 100 | 1234 2134
2019-01-01 | 200 | 150 | 130 | 3456
,如果您喜欢标题下的下划线:
$ awk -f tst.awk file | column -s, -o' | ' -t | awk '1;NR==1{gsub(/./,"-");print}'
Date | Price | Return | Volume | PERMNO
---------------------------------------------
2019-01-01 | 120 | 100 | 100 | 1234 2134
2019-01-01 | 200 | 150 | 130 | 3456
答案 1 :(得分:0)
这似乎提供了所需的输出(使用gawk 4.14):
#!/usr/bin/gawk -f
@include "join"
BEGIN { OFS="\t" }
/PERMNO/{ if(marketseen==1) {p=$3; marketseen=0} else p=p!="" ? p "," $3 : $3; }
/Market/{ marketseen=1 }
{ split($0,a,"|");
lc = "";
if (a[1]~Date) lc = "PERMNO";
if (a[1]~"0") lc = p;
if (NF<4) lc="";
print a[1],a[2],a[3],a[4], lc }
输入:
$ cat MarketData.txt
PERMNO = 1234
PERMNO = 2134
Market data:
Date | Price | Return | Volume
--------------------------------
2019-01-01| 120 | 100 | 100
PERMNO = 3456
Market data:
Date | Price | Return | Volume
--------------------------------
2019-01-01| 200 | 150 | 130
输出:
$ ./marketdata.sh MarketData.txt
PERMNO = 1234
PERMNO = 2134
Market data:
Date Price Return Volume PERMNO
--------------------------------
2019-01-01 120 100 100 1234,2134
PERMNO = 3456
Market data:
Date Price Return Volume PERMNO
--------------------------------
2019-01-01 200 150 130 3456
答案 2 :(得分:0)
您可以这样做:
BEGIN {FS=" = " ; H="Market data:\n\n Date | Price | Return | Volume | PERMNO" ; print H}
/PERMNO/ {PNO = PNO "," $2 "," }
/2[0-9]{3}-/ { gsub(",,+",",",PNO) ; gsub("^,|,$","",PNO) ; print $0 " | " PNO ; PNO = ""; next}
PERMNO
行设置字段定界符,并打印标题。PERMNO
ID(仅在与PERMNO
匹配的行上)。PNO
变量(无前导,结尾或重复逗号),然后用附加了PNO
的值。