如何仅为数据子集制作beanplot

时间:2015-08-06 22:01:05

标签: r plot subset data-visualization

我有一个看起来像这样的数据框

     stream  n  rates     means   column    value    truevalue
1    Brooks   3   3.0    0.9629152    1    0.42707006 0.9440620
2    Siouxon  3   3.0    0.5831929    1    0.90503736 0.5858527
3    Speelyai 3   3.0    0.6199235    1    0.08554021 0.5839844
4    Brooks   4   7.5    0.9722707    1    1.43338843 0.9440620
5    Siouxon  4   7.5    0.5865031    1    0.50574543 0.5858527
6    Speelyai 4   7.5    0.6118634    1    0.32252396 0.5839844
7    Brooks   5  10.0    0.9637475    1    0.88984211 0.9440620
8    Siouxon  5  10.0    0.5804420    1    0.47501800 0.5858527
9    Speelyai 5  10.0    0.5959238    1    0.15079491 0.5839844

继续进行56,000行。我想制作一个beanplot,我想制作3个不同的beanplots,每个流一个。我宁愿不将此数据框子集化以创建3个新的/单独的数据帧。有没有办法指定你想要stream=="Brooks"的豆图?

这是我所拥有的

beanplot(error~rates, data= result, col=c("orange", "black", "white", "red"), border ="pink", what=c(0,1,1,1), maxstripline=.05)

可以工作,但为所有数据创建一个beanplot。我试过这个没用的

beanplot(error~rates, data= result[stream=="Speelyai"], col=c("orange", "black", "white", "red"), border ="pink", what=c(0,1,1,1), maxstripline=.05)

2 个答案:

答案 0 :(得分:3)

试试这个:

beanplot(error~rates, data= result[result$stream=="Speelyai", ], col=c("orange", "black", "white", "red"), border ="pink", what=c(0,1,1,1), maxstripline=.05)

答案 1 :(得分:1)

我想是这样的:

beanplot(error~rates, data= result[result[,"stream"]=="Speelyai",], col=c("orange", "black", "white", "red"), border ="pink", what=c(0,1,1,1), maxstripline=.05)

或者,如果您想要更紧凑的内容,请尝试使用data.table。在子集化方面更加紧凑,一旦你设置好了(你可以先设置密钥,它仍然会更紧凑,但速度稍慢):

# load package
library(data.table)

# convert to data.table, and set key for subsetting
result <- as.data.table(result)
setkey(result, stream)

# save your original plotting code (minus the data part) as an expression
original.plot <- expression(beanplot(error~rates, col=c("orange", "black", "white", "red"), border ="pink", what=c(0,1,1,1), maxstripline=.05))

# make the plot for this stream only
result["Speelyai", eval(original.plot)]

然后,如果你想为这3个流制作情节,你可以做类似

的事情
par(mfrow=c(2,2)) # I'm doing 4 panels just so it's a square; 1 will be empty
result[c("Brooks","Siouxon","Speelyai"), eval(original.plot), by=c("stream")]

习惯data.table可能需要一段时间,但它往往是非常方便的表示法并且非常快。非常便于子集化或为多个子集执行任务。