我有一个这样的数据框:
HLA_Status variable value
1 PP CCL24 9.645
2 PP CCL24 56.32
3 PP CCL24 7.268
4 PC CCL24 5.698
5 PC CCL24 89.457
6 PC CCL24 78.23
7 PP SPP1 23.12
8 PP SPP1 36.32
9 PP SPP1 17.268
10 PC SPP1 2.698
11 PC SPP1 9.457
12 PC SPP1 8.23
我想用reshape2 :: dcast()重塑我的数据框并得到:
HLA_Status CCL24 SPP1
1 PP 9.645 23.12
2 PP 56.32 36.32
3 PP 7.268 17.268
13 PC 5.698 2.698
14 PC 89.457 9.457
15 PC 78.230 8.23
但是我没有做到这一点。
我尝试过:
dcast(mydt, HLA_Status ~ variable, value.var = "value")
但是没有用。
我在reshape2的文档中看到,如果每个单元格有多个值,则需要告诉dcast如何汇总数据。
我认为我的问题是不知道该给fun.aggregate些什么。
如何使用reshape2或其他软件包获取所需的数据框?
答案 0 :(得分:2)
我们可以使用spread
中的tidyr
library(dplyr)
library(tidyr)
df %>%
group_by(HLA_Status, variable) %>%
mutate(row = row_number()) %>%
spread(variable, value) %>%
ungroup() %>%
select(-row)
# A tibble: 6 x 3
# HLA_Status CCL24 SPP1
# <fct> <dbl> <dbl>
#1 PC 5.70 2.70
#2 PC 89.5 9.46
#3 PC 78.2 8.23
#4 PP 9.64 23.1
#5 PP 56.3 36.3
#6 PP 7.27 17.3
数据
df <- structure(list(HLA_Status = structure(c(2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("PC", "PP"), class = "factor"),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("CCL24", "SPP1"), class = "factor"),
value = c(9.645, 56.32, 7.268, 5.698, 89.457, 78.23, 23.12,
36.32, 17.268, 2.698, 9.457, 8.23)), class = "data.frame", row.names =
c(NA, -12L))
答案 1 :(得分:2)
这可以通过dcast
(在data.table
中完成)来完成,尽管您需要一个行标识符。
library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
# HLA_Status HLA_Status_1 CCL24 SPP1
#1: PC 1 5.698 2.698
#2: PC 2 89.457 9.457
#3: PC 3 78.230 8.230
#4: PP 1 9.645 23.120
#5: PP 2 56.320 36.320
#6: PP 3 7.268 17.268
数据
dt <- fread(" HLA_Status variable value
PP CCL24 9.645
PP CCL24 56.32
PP CCL24 7.268
PC CCL24 5.698
PC CCL24 89.457
PC CCL24 78.23
PP SPP1 23.12
PP SPP1 36.32
PP SPP1 17.268
PC SPP1 2.698
PC SPP1 9.457
PC SPP1 8.23")
答案 2 :(得分:1)
如果实际需要reshape2::dcast
,则可以使用ave
解决方案(要获取标识符,请参见 @markus '答案):
reshape2::dcast(d, HLA_Status + ave(rep(1, nrow(d)), d[1:2], FUN=seq) ~ variable)
# HLA_Status ave(rep(1, nrow(d)), d[1:2], FUN = seq) CCL24 SPP1
# 1 PC 1 5.698 2.698
# 2 PC 2 89.457 9.457
# 3 PC 3 78.230 8.230
# 4 PP 1 9.645 23.120
# 5 PP 2 56.320 36.320
# 6 PP 3 7.268 17.268
数据
d <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC",
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24",
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1",
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457,
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA,
-12L), class = "data.frame")
答案 3 :(得分:0)
我强烈建议您切换到tidyr
,而不要使用reshape2
。但是,如果您真的想使用dcast
,这就是方法
library(dplyr)
library(reshape2)
df <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC",
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24",
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1",
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457,
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA,
-12L), class = "data.frame")
df %>%
group_by(variable, HLA_Status) %>%
mutate(id = row_number()) %>%
dcast(HLA_Status+id ~ variable, value.var = "value") %>%
select(-id)
HLA_Status CCL24 SPP1
1 PC 5.698 2.698
2 PC 89.457 9.457
3 PC 78.230 8.230
4 PP 9.645 23.120
5 PP 56.320 36.320
6 PP 7.268 17.268