reshape2:当一个单元有多个值但保留此值时,dcast

时间:2019-05-10 09:57:06

标签: r dataframe reshape2

我有一个这样的数据框:

    HLA_Status    variable      value
1     PP            CCL24       9.645
2     PP            CCL24       56.32
3     PP            CCL24       7.268
4     PC            CCL24       5.698
5     PC            CCL24       89.457
6     PC            CCL24       78.23
7     PP            SPP1        23.12
8     PP            SPP1        36.32
9     PP            SPP1        17.268
10    PC            SPP1        2.698
11    PC            SPP1        9.457
12    PC            SPP1        8.23

我想用reshape2 :: dcast()重塑我的数据框并得到:

   HLA_Status        CCL24        SPP1
1      PP            9.645       23.12
2      PP            56.32       36.32
3      PP            7.268       17.268
13     PC            5.698       2.698
14     PC            89.457      9.457
15     PC            78.230      8.23

但是我没有做到这一点。

我尝试过:

dcast(mydt, HLA_Status ~ variable, value.var = "value")

但是没有用。

我在reshape2的文档中看到,如果每个单元格有多个值,则需要告诉dcast如何汇总数据。

我认为我的问题是不知道该给fun.aggregate些什么。

如何使用reshape2或其他软件包获取所需的数据框?

4 个答案:

答案 0 :(得分:2)

我们可以使用spread中的tidyr

library(dplyr)
library(tidyr)

df %>%
  group_by(HLA_Status, variable) %>%
  mutate(row = row_number()) %>%
  spread(variable, value) %>%
  ungroup() %>%
  select(-row)

# A tibble: 6 x 3
#  HLA_Status CCL24  SPP1
#  <fct>     <dbl> <dbl>
#1   PC       5.70  2.70
#2   PC       89.5  9.46
#3   PC       78.2  8.23
#4   PP       9.64  23.1 
#5   PP       56.3  36.3 
#6   PP       7.27  17.3 

数据

df <- structure(list(HLA_Status = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("PC", "PP"), class = "factor"), 
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("CCL24", "SPP1"), class = "factor"), 
value = c(9.645, 56.32, 7.268, 5.698, 89.457, 78.23, 23.12, 
36.32, 17.268, 2.698, 9.457, 8.23)), class = "data.frame", row.names = 
c(NA, -12L))

答案 1 :(得分:2)

这可以通过dcast(在data.table中完成)来完成,尽管您需要一个行标识符。

library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
#   HLA_Status HLA_Status_1  CCL24   SPP1
#1:         PC            1  5.698  2.698
#2:         PC            2 89.457  9.457
#3:         PC            3 78.230  8.230
#4:         PP            1  9.645 23.120
#5:         PP            2 56.320 36.320
#6:         PP            3  7.268 17.268

数据

dt <- fread("    HLA_Status    variable      value
     PP            CCL24       9.645
     PP            CCL24       56.32
     PP            CCL24       7.268
     PC            CCL24       5.698
     PC            CCL24       89.457
     PC            CCL24       78.23
     PP            SPP1        23.12
     PP            SPP1        36.32
     PP            SPP1        17.268
     PC            SPP1        2.698
     PC            SPP1        9.457
     PC            SPP1        8.23")

答案 2 :(得分:1)

如果实际需要reshape2::dcast,则可以使用ave解决方案(要获取标识符,请参见 @markus '答案):

reshape2::dcast(d, HLA_Status + ave(rep(1, nrow(d)), d[1:2], FUN=seq) ~ variable)
#   HLA_Status ave(rep(1, nrow(d)), d[1:2], FUN = seq)  CCL24   SPP1
# 1         PC                                       1  5.698  2.698
# 2         PC                                       2 89.457  9.457
# 3         PC                                       3 78.230  8.230
# 4         PP                                       1  9.645 23.120
# 5         PP                                       2 56.320 36.320
# 6         PP                                       3  7.268 17.268

数据

d <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC", 
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24", 
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1", 
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457, 
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA, 
-12L), class = "data.frame")

答案 3 :(得分:0)

我强烈建议您切换到tidyr,而不要使用reshape2。但是,如果您真的想使用dcast,这就是方法

library(dplyr)
library(reshape2)
df <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC", 
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24", 
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1", 
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457, 
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA, 
-12L), class = "data.frame")


df %>% 
  group_by(variable, HLA_Status) %>%
  mutate(id = row_number()) %>% 
  dcast(HLA_Status+id ~ variable, value.var = "value") %>%
  select(-id)

  HLA_Status  CCL24   SPP1
1         PC  5.698  2.698
2         PC 89.457  9.457
3         PC 78.230  8.230
4         PP  9.645 23.120
5         PP 56.320 36.320
6         PP  7.268 17.268