我正在读取从csv文件到数据框的一组主题响应时间,需要对其进行反规范化,如下所示:
实际输入:
Subject,1,2,3,4,5 Alpha,97,98,99,100,101 Beta,102,103,NA,104,0.00 Gamma,105,NA,NA,NA,NA
预期产出:
subject response Alpha 97 Alpha 98 Alpha 99 Alpha 100 Alpha 101 Beta 102 Beta 103 Beta 101 # MEDIAN Beta 104 Beta 101 # MEDIAN Gamma 105 Gamma 101 # MEDIAN Gamma 101 # MEDIAN Gamma 101 # MEDIAN Gamma 101 # MEDIAN
我已经部分使用:
input <- read.csv("rt.csv", header = TRUE, sep = ",")
names(input) <- tolower(names(input))
response <- input[setdiff(names(input), names(input[1]))]
cntCols <- ncol(response)
y <- response[[1]]
for (i in 2:cntCols) {
y = c(y, response[[i]])
}
extract <- as.data.frame(y)
wip <-
data.frame(
x = rep(c(levels(input[[1]]))),
y = extract
)
wip <- wip[order(wip[,1]),]
mdnInputY <- median(wip$y, na.rm = TRUE)
MedianReplace <- function(dfInput) {
dfInput[is.na(dfInput)] <- mdnInputY
dfInput[trimws(dfInput) == 0] <- mdnInputY
return(dfInput)
}
output <- data.frame(apply(wip, 2, MedianReplace))
然而,它失败了一点:
请指教?
答案 0 :(得分:0)
使用aggregate
中的{stats}
。
aggregate(x = input['V2'], by = input['V1'], FUN = paste, collapse =', ')
aggregate(formula = V2 ~ V1, data = input, FUN = paste, collapse =', ')