R脚本去正规化数据帧和中位数替换NA

时间:2016-03-17 22:31:01

标签: r denormalized

我正在读取从csv文件到数据框的一组主题响应时间,需要对其进行反规范化,如下所示:

  • 将所有列折叠为两列,然后
  • 将NA和零值替换为原始响应时间的中位数。

实际输入:

Subject,1,2,3,4,5
Alpha,97,98,99,100,101
Beta,102,103,NA,104,0.00
Gamma,105,NA,NA,NA,NA

预期产出:

subject response
Alpha   97 
Alpha   98 
Alpha   99 
Alpha   100 
Alpha   101 
Beta    102
Beta    103
Beta    101 # MEDIAN
Beta    104
Beta    101 # MEDIAN
Gamma   105
Gamma   101 # MEDIAN
Gamma   101 # MEDIAN
Gamma   101 # MEDIAN
Gamma   101 # MEDIAN

我已经部分使用:

input <- read.csv("rt.csv", header = TRUE, sep = ",")
names(input) <- tolower(names(input))

response <- input[setdiff(names(input), names(input[1]))]
cntCols  <- ncol(response)
y <- response[[1]]
for (i in 2:cntCols) {
    y = c(y, response[[i]])
}
extract <- as.data.frame(y)

wip <-
  data.frame(
    x = rep(c(levels(input[[1]]))),
    y = extract
  )

wip <- wip[order(wip[,1]),]

mdnInputY <- median(wip$y, na.rm = TRUE)
MedianReplace <- function(dfInput) {
  dfInput[is.na(dfInput)] <- mdnInputY
  dfInput[trimws(dfInput) == 0] <- mdnInputY
  return(dfInput)
}

output <- data.frame(apply(wip, 2, MedianReplace))

然而,它失败了一点:

  • 不是惯用语(矢量化)。

请指教?

1 个答案:

答案 0 :(得分:0)

使用aggregate中的{stats}

aggregate(x = input['V2'], by = input['V1'], FUN =  paste, collapse =', ')
aggregate(formula = V2 ~ V1, data = input, FUN =  paste, collapse =', ')