我实现了两个函数reshape_long
和reshape_wide
(请参阅下面的完整工作示例)来重塑数据框架。
我创建了几个小例子,这两个函数似乎正常工作。
但是,使用reshape_wide
功能
在我的真实数据集(大约200.000到300.000行)失败。发生的事情是将X,Y和Z的所有值都设置为1。
我的实际数据结构与下面的小例子完全相同。在工作了2天后,我想到了
问题是“主键”(test_name
,group_name
和id
)仅在广泛形式中是唯一的。申请后
reshape_long
功能主键不再是唯一的。我在想,有谁能告诉我是否有一步
由于d1 -> reshape_wide -> d2
的非唯一性,d1
可以发挥作用吗?
library(reshape2)
library(taRifx)
reshape_long <- function(data, ids) {
# Bring data into long form
data_long <- melt(data, id.vars = ids,
variable.name="Data_Points", value.name="value")
data_long$value <- as.numeric(data_long$value)
# Remove rows were analyte value is NA
data_long <- data_long[!is.na(data_long$value), ]
# Resort data
formula_sort <- as.formula(paste("~", paste(ids, collapse="+")))
data_long <- sort(data_long, f = formula_sort)
return(data_long)
}
reshape_wide <- function(data, ids) {
# Bring data into wide form
formula_wide <- as.formula(paste(paste(ids, collapse="+"),
"~ Data_Points"))
data_wide <- dcast(data, formula_wide)
# Resort data
formula_sort <- as.formula(paste("~", paste(ids, collapse="+")))
data_wide <- sort(data_wide, f = formula_sort)
return(data_wide)
}
d <- data.frame(
test_name = c(rep("Test_A", 6), rep("Test_B", 6)),
group_name = c(rep("Group_C", 3), rep("Group_D", 3),
rep("Group_C", 3), rep("Group_D", 3)),
id = c("I1", "I2", "I3", "I4", "I5", "I6",
"I1", "I2", "I3", "I7", "I8", "I9"),
X = c(NA,NA,1,2,3,4,5,6,NA,7,8,9),
Y = as.numeric(10:21),
Z = c(NA,22,23,NA,24,NA,25,26,NA,27,28,29)
)
d
d1 <- reshape_long(d, ids=c("test_name", "group_name", "id"))
d1
d2 <- reshape_wide(d1, ids=c("test_name", "group_name", "id"))
d2
identical(d,d2)
答案 0 :(得分:1)
您编写函数的方式,假设您的示例中ids
(test_name
,group_name
和id
的组合在原始数据。最简单的方法是获取d
和重复的行。
> ddup <- rbind(d,d)
> ddup1 <- reshape_long(ddup, ids=c("test_name", "group_name", "id"))
> ddup2 <- reshape_wide(ddup1, ids=c("test_name", "group_name", "id"))
Aggregation function missing: defaulting to length
>
> identical(ddup,ddup2)
[1] FALSE
请注意,您的reshape_wide
会认为ids
和Data_Points
是唯一的。在这个例子中,它们不是。警告消息表明dcast
已使用length
将每个组合的多个值汇总为单个值。