Question

直方图聚类/条形图

我正在尝试使用 gnuplot 从此数据文件中生成以下直方图群集，其中每个类别都代表在数据文件中每年单独：

# datafile
year   category        num_of_events
2011   "Category 1"    213
2011   "Category 2"    240
2011   "Category 3"    220
2012   "Category 1"    222
2012   "Category 2"    238
...

desired histogram cluster

但我不知道怎么做每个类别一行。如果有人知道如何用gnuplot做这个，我会很高兴。

堆积直方图簇/堆积条形图

更好的是堆积直方图群集，如下所示，其中堆积的子类别由数据文件中的单独的列表示：

# datafile
year   category        num_of_events_for_A    num_of_events_for_B
2011   "Category 1"    213                    30
2011   "Category 2"    240                    28
2011   "Category 3"    220                    25
2012   "Category 1"    222                    13
2012   "Category 2"    238                    42
...

desired stacked histogram cluster

提前多多感谢！

Answer 1

经过一番研究，我提出了两种不同的解决方案。

必需：拆分数据文件

两种解决方案都需要将数据文件拆分为多个按列分类的文件。因此，我创建了一个简短的ruby脚本，可以在这个要点中找到：

https://gist.github.com/fiedl/6294424

此脚本的用法如下：要将数据文件data.csv拆分为data.Category1.csv和data.Category2.csv，请致电：

# bash
ruby categorize_csv.rb --column 2 data.csv

# data.csv
# year   category   num_of_events_for_A   num_of_events_for_B
"2011";"Category1";"213";"30"
"2011";"Category2";"240";"28"
"2012";"Category1";"222";"13"
"2012";"Category2";"238";"42"
...

# data.Category1.csv
# year   category   num_of_events_for_A   num_of_events_for_B
"2011";"Category1";"213";"30"
"2012";"Category1";"222";"13"
...

# data.Category2.csv
# year   category   num_of_events_for_A   num_of_events_for_B
"2011";"Category2";"240";"28"
"2012";"Category2";"238";"42"
...

解决方案1：堆积箱图

策略：每个类别一个数据文件。每堆一列。通过使用gnuplot的“with boxes”参数“手动”绘制直方图的条形。

上升：关于酒吧大小，上限，颜色等的充分灵活性

下行：必须手动放置条形码。

# solution1.gnuplot
reset
set terminal postscript eps enhanced 14

set datafile separator ";"

set output 'stacked_boxes.eps'

set auto x
set yrange [0:300]
set xtics 1

set style fill solid border -1

num_of_categories=2
set boxwidth 0.3/num_of_categories
dx=0.5/num_of_categories
offset=-0.1

plot 'data.Category1.csv' using ($1+offset):($3+$4) title "Category 1 A" linecolor rgb "#cc0000" with boxes, \
     ''                   using ($1+offset):3 title "Category 2 B" linecolor rgb "#ff0000" with boxes, \
     'data.Category2.csv' using ($1+offset+dx):($3+$4) title "Category 2 A" linecolor rgb "#00cc00" with boxes, \
     ''                   using ($1+offset+dx):3 title "Category 2 B" linecolor rgb "#00ff00" with boxes

结果如下：

stacked_boxes.eps

解决方案2：原生Gnuplot直方图

策略：每年一个数据文件。每堆一列。直方图是使用gnuplot的常规直方图机制生成的。

上升：更易于使用，因为定位不需要手动完成。

下行：由于所有类别都在一个文件中，因此每个类别都有相同的颜色。

# solution2.gnuplot
reset
set terminal postscript eps enhanced 14

set datafile separator ";"

set output 'histo.eps'
set yrange [0:300]

set style data histogram
set style histogram rowstack gap 1
set style fill solid border -1
set boxwidth 0.5 relative

plot newhistogram "2011", \
       'data.2011.csv' using 3:xticlabels(2) title "A" linecolor rgb "red", \
       ''              using 4:xticlabels(2) title "B" linecolor rgb "green", \
     newhistogram "2012", \
       'data.2012.csv' using 3:xticlabels(2) title "" linecolor rgb "red", \
       ''              using 4:xticlabels(2) title "" linecolor rgb "green", \
     newhistogram "2013", \
       'data.2013.csv' using 3:xticlabels(2) title "" linecolor rgb "red", \
       ''              using 4:xticlabels(2) title "" linecolor rgb "green"

结果如下：

histo.eps

参考

Answer 2

非常感谢@fiedl！根据您的解决方案＃1，我可以使用两个以上的堆叠子类别得出自己的堆叠/集群直方图。

这是我的代码：

set terminal pngcairo  transparent enhanced font "arial,10" fontscale 1.0 size 600, 400 
set output 'runtimes.png'

set xtics("1" 1, "2" 2, "4" 3, "8" 4)
set yrange [0:100]

set style fill solid border -1
set key invert
set grid

num_of_ksptypes=2
set boxwidth 0.5/num_of_ksptypes
dx=0.5/num_of_ksptypes
offset=-0.12

set xlabel "threads"
set ylabel "seconds"

plot 'data1.dat' using ($1+offset):($2+$3+$4+$5) title "SDO" linecolor rgb "#006400" with boxes, \
         ''                   using ($1+offset):($3+$4+$5) title "BGM" linecolor rgb "#FFFF00" with boxes, \
         ''                   using ($1+offset):($4+$5) title "TSQR" linecolor rgb "#FFA500 " with boxes, \
         ''                   using ($1+offset):5 title "SpMV" linecolor rgb "#FF0000" with boxes, \
         'data2.dat' using ($1+offset+dx):($2+$3) title "MGS" linecolor rgb "#8B008B" with boxes, \
         ''                   using ($1+offset+dx):3 title "SpMV" linecolor rgb "#0000FF" with boxes

data1.dat：

nr  SDO  BGM  TSQR  SpMV
1   10   15   20    25
2   10   10   10    10
3   10   10   10    10
4   10   10   10    10

data2.dat：

nr  MGS  SpMV
1   23   13
2   23   13
3   23   13
4   23   13

结果图：

Gnuplot直方图簇（条形图），每个类别一行

直方图聚类/条形图

堆积直方图簇/堆积条形图

2 个答案:

必需：拆分数据文件

解决方案1：堆积箱图

解决方案2：原生Gnuplot直方图

参考