不同数据文件中出现次数的直方图

时间:2018-12-05 19:37:36

标签: shell awk gnuplot

我的程序仿真的结果是几个数据文件,第一列表示成功(=0)或错误(=1),第二列表示以秒为单位的仿真时间。

这两列的示例是:

1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177 
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635 
1 238.5124609287994 
0 246.4538392581228 
.   .
.   .
.   .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778 
1 330.52804695995303
0 332.0673690346546 
0 358.3001385706268 
0 359.82271742496414
1 400.8162129871805 
0 404.88783391725985
1 411.27012219170393

我可以对数据进行合并的错误(1's)的频率图(直方图)。

set encoding iso_8859_1
set key left top 
set ylabel "P_{error}" 
set xlabel "Time [s]" 
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16 
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3

`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`

stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"

结果

enter image description here

我想对上述Gnuplot进行改进,以包括其余的数据文件lookup-.....-.txt及其错误样本,并将它们加入相同的频率图中。

我还希望避免使用t7.dat之类的中间文件。

此外,我想绘制一条误差概率均值的水平线。

如何在同一图中绘制所有样本数据?

致谢

2 个答案:

答案 0 :(得分:2)

如果我对您的理解正确,则希望对多个文件进行直方图绘制。因此,您基本上必须串联多个数据文件。 当然,您可以使用一些外部程序(例如awk等)或shell命令来执行此操作。 以下是gnuplot和系统命令的可能解决方案,不需要临时文件。 system命令用于Windows,但是您可以轻松地将其翻译为Linux。也许您需要检查“ NaN”值是否不会混淆您的装箱和直方图结果。

### start code
reset session
# create some dummy data files
do for [i=1:5] {
    set table sprintf("lookup-blahblah_%d.txt", i)
    set samples 50
    plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
    unset table
}
# end creating dummy data files

FILELIST = system("dir /B lookup*.txt")   # this is for Windows
print FILELIST

undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
    plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table

print $AllDataWithError

# ... do your binning and plotting
### end of code

编辑:

显然,NaN和/或空白行似乎弄乱了smooth freq和/或分箱?! 因此,我们只需要提取错误(= 1)的行。 通过以上代码,您可以将多个文件合并到一个数据块中。 下面的代码已经从一个与您的数据相似的数据块开始。

### start of code
reset session

# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
    set samples 1000
    plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data

stats $Data nooutput
Datapoints = STATS_records

# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique 
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
    plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
   plot $Dummy u 1:2 with table
unset table

bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records

set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
    Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
    $Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\

unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"

binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"

unset multiplot
### end of code

结果如下: this topic about set_allocated_foo()

答案 1 :(得分:0)

您可以通过管道将data和plot指令传递给gnuplot而无需临时文件,

例如

$ awk 'BEGIN{print "plot \"-\" using ($1):($2)"; 
             while(i++<20) print i,rand()*20; print "e"}' | gnuplot -p

将创建一个随机图。您可以像我一样在BEGIN块中打印指令,而主awk语句可以过滤数据。

对于您的情节,类似这样

$ awk 'BEGIN{print "...." }
       $1==1{print $2}
       END  {print "e"}' lookup-*.txt | gnuplot -p