如何在熔化过程中保留重复的rownames?

时间:2017-12-20 06:26:59

标签: r reshape reshape2

我正在尝试在R中绘制分组的boxplot。数据类似于:

mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769, 
    3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
    ), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858), 
        P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787, 
        3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category", 
    "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
    ), row.names = 1:2, class = "data.frame")


mydf
##   Category      P10       P2      P10       P2      P10 P2      P10       P2      P10 P2
## 1      RPP 3.352175 0.000000 3.273879 0.000000 3.182712  0 2.770654 2.635534 0.000000  0
## 2       RR 3.539194 3.090578 3.160005 3.159419 3.316039  0 3.293477 3.245297 3.924497  0

我希望在ggplot2中生成如下所示的数据框:

  • 类别变量值
  • RPP P10 3.35
  • RPP P2 0
  • RR P10 3.54
  • ...

然后,在使用melt()函数之后:

melt(test, "Category")

我发现只有前两列保留在数据中,这意味着以下重复列未命中,因为它们具有相同的列名。 还有其他办法吗?

2 个答案:

答案 0 :(得分:2)

如果您使用&#34; data.table&#34;中的melt,则您不会遇到此问题:

library(data.table)
melt(as.data.table(mydf), "Category")
#     Category variable    value
#  1:      RPP      P10 3.352175
#  2:       RR      P10 3.539194
#  3:      RPP       P2 0.000000
#  4:       RR       P2 3.090578
#  5:      RPP      P10 3.273879
#  6:       RR      P10 3.160005
#  7:      RPP       P2 0.000000
#  8:       RR       P2 3.159419
#  9:      RPP      P10 3.182712
# 10:       RR      P10 3.316039
# 11:      RPP       P2 0.000000
# 12:       RR       P2 0.000000
# 13:      RPP      P10 2.770654
# 14:       RR      P10 3.293477
# 15:      RPP       P2 2.635534
# 16:       RR       P2 3.245297
# 17:      RPP      P10 0.000000
# 18:       RR      P10 3.924497
# 19:      RPP       P2 0.000000
# 20:       RR       P2 0.000000

基础R替代方案是使用stack,如下所示:

cbind(Category = mydf[[1]], stack(mydf[-1]))
##    Category   values   ind
## 1       RPP 3.352175   P10
## 2        RR 3.539194   P10
## 3       RPP 0.000000    P2
## 4        RR 3.090578    P2
## 5       RPP 3.273879 P10.1
## 6        RR 3.160005 P10.1
## 7       RPP 0.000000  P2.1
## 8        RR 3.159419  P2.1
## 9       RPP 3.182712 P10.2
## 10       RR 3.316039 P10.2
## 11      RPP 0.000000  P2.2
## 12       RR 0.000000  P2.2
## 13      RPP 2.770654 P10.3
## 14       RR 3.293477 P10.3
## 15      RPP 2.635534  P2.3
## 16       RR 3.245297  P2.3
## 17      RPP 0.000000 P10.4
## 18       RR 3.924497 P10.4
## 19      RPP 0.000000  P2.4
## 20       RR 0.000000  P2.4

根据您计划使用数据的方式,您可能还需要清理&#34; ind&#34;列。

示例数据:

mydf <- structure(list(Category = c("RPP", "RR"), P10 = c(3.352174769, 
    3.539193849), P2 = c(0, 3.090577955), P10 = c(3.273878984, 3.160004973
    ), P2 = c(0, 3.159418605), P10 = c(3.182712494, 3.316038858), 
        P2 = c(0L, 0L), P10 = c(2.770653831, 3.293476876), P2 = c(2.635533787, 
        3.245297416), P10 = c(0, 3.924497418), P2 = c(0L, 0L)), .Names = c("Category", 
    "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2", "P10", "P2"
    ), row.names = 1:2, class = "data.frame")

答案 1 :(得分:1)

如果您也进行了一些转换并且需要在某个时刻返回初始表示,那么通过仍然拥有您需要的组来获得此选项是很好的:

mydf %>% 
    setNames(nm = make.unique(names(.))) %>% 
    reshape2::melt("Category") %>% 
    transform(group = sub(x = variable, pattern = "\\.\\d+$", replacement = ""))

但@ A5C1D2H2I1M1N2O1R2T1的建议当然要短一些,我必须牢记这一点......不知道data.table可以解决这个问题。