我有一个像这样的CSV文件:
"Player","VPIP"
"$GIU$37","21.01"
"$VaSko$.017","14.11"
"*Lampiaao*","16.15"
"111bellbird","30.30"
"1221Vik1221","21.97"
"16266+626","20.83"
"17victor","16.09"
"1980locky","11.49"
"19dem","22.81"
"1lllllllllll","20.99"
......
我希望将(g)AWK中的以下行打印到输出文件中,并从引号之间提取信息:
<note player="player ID comes here" label="a number between 0-7 based on the number belong to the player's VPIP value" update="this one is irrelevant, one nuber can be used for all the lines"></note>
所以印刷线看起来像这样:
<note player="17victor" label="2" update="1435260533"></note>
显然,我想在阅读时忽略CSV文件的第一行,因为它只包含标题数据。标签号标准是:
0:VPIP&gt; 37.5
1:VPIP&lt; 10
2:VPIP在10 - 16.5之间
7:其余的。
关于如何做到的任何想法?
答案 0 :(得分:4)
试试这个awk脚本:
BEGIN {
FS = ","
update = 34513135
}
NR != 1 {
vpip = $2
gsub(/"/, "", vpip)
if (vpip > 37.5)
label = 0
else if (vpip < 10)
label = 1
else if (vpip < 16.5)
label = 2
else
label = 7
printf "<note player=%s label=%s update=%s></note>\n", $1, label, update
}
真的很简单:
BEGIN
块中设置了更新变量(这在解析文件之前执行。NR != 1
的每一行执行下一个代码。vpip
并删除引号以将其与整数进行比较。label
。要使用此代码,您应该执行awk -f script.awk file
。其中script.awk
是脚本的名称,file
是输入文件的路径。
使用示例:
$ cat file
"Player","VPIP"
"$GIU$37","21.01"
"$VaSko$.017","14.11"
"*Lampiaao*","16.15"
"111bellbird","30.30"
"1221Vik1221","21.97"
"16266+626","20.83"
"17victor","16.09"
"1980locky","11.49"
"19dem","22.81"
"1lllllllllll","20.99"
$ awk -f script.awk file
<note player="$GIU$37" label=7 update=34513135></note>
<note player="$VaSko$.017" label=2 update=34513135></note>
<note player="*Lampiaao*" label=2 update=34513135></note>
<note player="111bellbird" label=7 update=34513135></note>
<note player="1221Vik1221" label=7 update=34513135></note>
<note player="16266+626" label=7 update=34513135></note>
<note player="17victor" label=2 update=34513135></note>
<note player="1980locky" label=2 update=34513135></note>
<note player="19dem" label=7 update=34513135></note>
<note player="1lllllllllll" label=7 update=34513135></note>
如果您还有其他问题,请发表评论,我会详细说明。