我正在尝试在R中绘制分组的boxplot。数据类似于:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
mydf
## Category P10 P2 P10 P2 P10 P2 P10 P2 P10 P2
## 1 RPP 3.352175 0.000000 3.273879 0.000000 3.182712 0 2.770654 2.635534 0.000000 0
## 2 RR 3.539194 3.090578 3.160005 3.159419 3.316039 0 3.293477 3.245297 3.924497 0
我希望在ggplot2中生成如下所示的数据框:
然后,在使用melt()函数之后:
melt(test, "Category")
我发现只有前两列保留在数据中,这意味着以下重复列未命中,因为它们具有相同的列名。 还有其他办法吗?
答案 0 :(得分:2)
如果您使用&#34; data.table&#34;中的melt
,则您不会遇到此问题:
library(data.table)
melt(as.data.table(mydf), "Category")
# Category variable value
# 1: RPP P10 3.352175
# 2: RR P10 3.539194
# 3: RPP P2 0.000000
# 4: RR P2 3.090578
# 5: RPP P10 3.273879
# 6: RR P10 3.160005
# 7: RPP P2 0.000000
# 8: RR P2 3.159419
# 9: RPP P10 3.182712
# 10: RR P10 3.316039
# 11: RPP P2 0.000000
# 12: RR P2 0.000000
# 13: RPP P10 2.770654
# 14: RR P10 3.293477
# 15: RPP P2 2.635534
# 16: RR P2 3.245297
# 17: RPP P10 0.000000
# 18: RR P10 3.924497
# 19: RPP P2 0.000000
# 20: RR P2 0.000000
基础R替代方案是使用stack
,如下所示:
cbind(Category = mydf[[1]], stack(mydf[-1]))
## Category values ind
## 1 RPP 3.352175 P10
## 2 RR 3.539194 P10
## 3 RPP 0.000000 P2
## 4 RR 3.090578 P2
## 5 RPP 3.273879 P10.1
## 6 RR 3.160005 P10.1
## 7 RPP 0.000000 P2.1
## 8 RR 3.159419 P2.1
## 9 RPP 3.182712 P10.2
## 10 RR 3.316039 P10.2
## 11 RPP 0.000000 P2.2
## 12 RR 0.000000 P2.2
## 13 RPP 2.770654 P10.3
## 14 RR 3.293477 P10.3
## 15 RPP 2.635534 P2.3
## 16 RR 3.245297 P2.3
## 17 RPP 0.000000 P10.4
## 18 RR 3.924497 P10.4
## 19 RPP 0.000000 P2.4
## 20 RR 0.000000 P2.4
根据您计划使用数据的方式,您可能还需要清理&#34; ind&#34;列。
示例数据:
mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769,
3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858),
P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787,
3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category",
"P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
), row.names = 1:2, class = "data.frame")
答案 1 :(得分:1)
如果您也进行了一些转换并且需要在某个时刻返回初始表示,那么通过仍然拥有您需要的组来获得此选项是很好的:
mydf %>%
setNames(nm = make.unique(names(.))) %>%
reshape2::melt("Category") %>%
transform(group = sub(x = variable, pattern = "\\.\\d+$", replacement = ""))
但@ A5C1D2H2I1M1N2O1R2T1的建议当然要短一些,我必须牢记这一点......不知道data.table
可以解决这个问题。