我有一个宽大的表格,如下所示:
ID Test_11 LVL11 Score_X_11 Score_Y_11 Test_12 LV12 Score_X_12 Score_Y_12
1 A I 100 NA NA NA 100 100
2 A II 90 100 B II 90 85
3 NA NA NA NA B II 90 NA
4 A III 100 80 A III 75 75
5 B I NA 90 NA NA 60 50
6 B I 70 100 NA NA NA NA
7 B II 85 NA A I 60 60
用于排序的表格看起来像这样
Test_11 A
Test_11 B
Test_12 A
Test_12 B
第二个表告诉我们的是,对于Test_11,有两个版本,A和B(对于Test_12也是如此)。
我正在尝试创建一系列箱形图,用于绘制Test_11和Test_12的每个组合及其各自版本(A,B)的分布图。因此,对于Test_11 == A,创建的boxplot将有三个组(I,II,III),然后从子集中得到的图形信息,其中Test_11 == A,然后对于Test_11 == B,Test_12 == A ,和Test_12 == B.在这个例子中,总共应该创建4个图表。
我在R中所拥有的是:
z <- subset(df, df$Test_11=="A")
plot(z$LVL11, z$Score_X_11, varwidth = TRUE, notch = TRUE, xlab = 'LVL',
ylab = 'score')
我想要的,并且无法弄清楚如何做,就是编写一个for循环,为我做子集,以便我可以为我的实际数据集自动执行此操作,该数据集有几十个这些组合。
感谢您提供任何帮助和指导。
答案 0 :(得分:1)
也许你应该在循环之前将所有逻辑向量保存在data.frame或矩阵中:
selections <- matrix(nrow = nrow(df), ncol = 4)
selections[,1] <- df$Test_11 == "A"
selections[,2] <- df$Test_11 == "B"
selections[,3] <- df$Test_12 == "A"
selections[,4] <- df$Test_12 == "B"
# etc...
par(mfcol = c(2, 2)) # here you should customize at will...
for (i in 1:4) {
z <- subset(df, selections[,i])
plot(z$LVL11, z$Score_X_11, varwidth = TRUE,
notch = TRUE, xlab = 'LVL',
ylab = 'score')
}
您可以更改代码,而不是使用z$Score_X_11
,请使用z[,string]
。 string
的值应使用paste
(或其他字符串操作函数)构造。例如:
v <- c("X", "Y")
n <- c("11", "12")
for (i in 1:2) {
for (j in 1:2) {
string <- paste("Score", v[i], n[i], sep = "_")
print(string)
}
}
类似的推理将与z$LVLXX
值一起使用,因此您应该能够找到适应的方法。
我对使用格子图形(比如在其他版本中)不是很有经验,但我知道一点ggplot2,所以我决定接受挑战并尝试一下。这不是很好,但至少有效:
# df <- read.table("data.txt", header = TRUE, na.string = "NA")
require(reshape2)
require(ggplot2)
# Melt your data.frame, using the scores as the "values":
mdf <- melt(df[,-1], id = c("LVL11", "LV12", "Test_11", "Test_12"))
# loop through level types:
for (lvl in c("LVL11", "LV12")) {
# looping through values of test11
for (test11 in c("A", "B")) {
# Note the use of subset before ggplot
p <- ggplot(subset(mdf, Test_11 == test11), aes_string(x=lvl, y="value"))
# I added the geom_jitter as in the example given there were only a few points
g <- p + geom_boxplot(aes(fill = variable)) + geom_jitter(aes(shape = variable))
print(g) # it is necessary to print p explicitly like this in order to use ggplot in a loop
# Finally, save each plot with a relevant name:
savePlot(paste0(lvl, "-t11", test11, ".png"))
# (note that savePlot has some problems with RStudio iirc)
}
# Same as before, but with test_12
for (test12 in c("A", "B")) {
p <- ggplot(subset(mdf, Test_12 == test12), aes_string(x=lvl, y="value"))
g <- p + geom_boxplot(aes(fill = variable)) + geom_jitter(aes(shape = variable))
print(g)
savePlot(paste0(lvl, "-t12", test12, ".png"))
}
}
如果有人知道如何在这种情况下使用格子图形或facet_grid
,那么我可以将所有grahpics放在一张图片中,我很想听听。
欢呼声。
答案 1 :(得分:1)
经典plyr
解决方案(HT to @hadleywickham)
require(plyr); require(lattice); require(gridExtra)
bplots <- dlply(dat, .(Test_11, Test_12), function(df){
bwplot(Score_X_11 ~ LVL11, data = df)
})
do.call('grid.arrange', bplots)