Question

我有csv文件，我想从第6列创建直方图。使用Linux实用程序这很简单：

└──> cut -f6 -d, data.csv | sort | uniq -c | sort -k2,2n
    563 0.0
     72 0.025
     35 0.05
     22 0.075
     14 0.1
     21 0.125
     14 0.15
     10 0.175
      5 0.2
      3 0.225
      7 0.25
      3 0.275
      6 0.3
      5 0.325
      3 0.35
      1 0.375
      3 0.4
      1 0.425
      3 0.45
      3 0.475
      5 0.5
      7 0.525
     11 0.55
      3 0.575
      4 0.6
      3 0.625
     11 0.65
      5 0.675
      9 0.7
      5 0.725
      7 0.75
      8 0.775
      5 0.8
      3 0.825
      3 0.85
      4 0.875
      2 0.9
      1 0.925
      1 0.975
    109 1.0

但是我想用gnuplot来绘制它，我试图修改我发现的following脚本。这是我修改过的版本：

#!/usr/bin/gnuplot -p
# http://psy.swansea.ac.uk/staff/carter/gnuplot/gnuplot_frequency.htm

clear
reset

set datafile separator ",";
# set term dumb

set key off
set border 3

# Add a vertical dotted line at x=0 to show centre (mean) of distribution.
set yzeroaxis

# Each bar is half the (visual) width of its x-range.
set boxwidth 0.05 absolute
set style fill solid 1.0 noborder

bin_width = 0.1;
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )

# MAKE BINS
# plot dataset_path using (rounded($6)):(6) smooth frequency with boxes

# DO NOT MAKE BINS
plot "data.csv" using 6:6 smooth frequency with boxes

结果如下：

this http://oi57.tinypic.com/x1acrm.jpg

它说的是与Unix工具完全不同的东西。在gnuplot我看过各种类型的直方图，例如一些遵循正态分布模式，另一些按照频率排序（好像我用sort -k2,2n替换最后一个sort -n）另一个根据创建直方图的数字（我的情况）等排序。如果我可以选择会很好。

Answer 1

smooth frequency将数据呈现为x中的单调（即第一列using列中给出的值，在您的情况下是第6列中的数值），然后将所有y值相加（第二个using列中给出的值）。

这里你还给出了第六列，如果要计算第六列中每个不同值的出现次数是错误的，请使用using 6:(1)，即数值1 in第二列，计算每个值的实际出现次数：

set style fill solid noborder
set boxwidth 0.8 relative
set datafile separator ','
plot 'nupic_out.csv' using 6:(1) smooth frequency with boxes notitle

enter image description here

要将平滑比例应用于平滑数据，必须先将其保存到set table ...; plot的临时文件中，然后绘制此临时文件。

set datafile separator ','
set table 'tmp.dat'
plot 'nupic_out.csv' using 6:(1) smooth frequency with lines
unset table

在这里你必须注意，因为gnuplot中的一个错误会在输出文件中添加一个错误的最后一行，你必须跳过它。您可以通过using语句中的过滤器跳过此操作，例如

plot 'tmp.dat' using (strcol(3) eq "i" ? $1 : 1/0):2 with boxes

在这里工作正常，或者您可以使用head来剪切最后两行，如

plot '< head -n-2 tmp.dat' using 1:2 with boxes

需要注意的另一点是，gnuplot总是使用空格来写出其数据文件，因此在绘制whitespace之前，必须将数据文件分隔符更改回tmp.dat。

完整的工作脚本可能是

set style fill solid noborder
set boxwidth 0.8 relative
set datafile separator ','

set table 'tmp.dat'
plot 'nupic_out.csv' using 6:(1) smooth frequency with lines notitle
unset table

set datafile separator whitespace
set logscale y
set yrange [0.8:*]
set autoscale xfix
plot '< head -n-2 tmp.dat' using 1:2 with boxes notitle

enter image description here

现在，对第六列中的值使用binning函数，必须用6替换using 6:(1)中的$6函数，该函数对第六列中给出的值进行操作。此函数必须包含在（）中，并使用函数内的plot 'nupic_out.csv' using (bin($6)):(1) smooth frequency with lines引用第六列中的当前值，例如

set style fill solid noborder
set datafile separator ','

set boxwidth 0.09 absolute
Min = -0.05
Max = 1.05
n = 11.0
width = (Max-Min)/n
bin(x) = width*(floor((x-Min)/width)+0.5) + Min

set table 'tmp.dat'
plot 'nupic_out.csv' using (bin($6)):(1) smooth frequency with lines notitle
unset table

set datafile separator whitespace
set logscale y
set xrange [-0.05:1.05]
set tics nomirror out
plot '< head -n-2 tmp.dat' using 1:2 with boxes notitle

同样，使用ChrisW's binning function的完整工作脚本可能是

WHERE

enter image description here

unix实用程序中的gnuplot与直方图中的直方图

1 个答案: