R中的索引和匹配的等价物

时间:2017-07-03 17:29:06

标签: r

我一直在寻找以前的解决方案,相当于excel的索引和匹配,并且无法在R中使用我的数据。 我在下面提供了一个示例数据集。有4个样品,测量值是'CAG'的频率计数,即在样品A01中有6485个13个CAG的计数。我设法计算了模态CAG(CAG中每列最高测量值)。我还设法使用psych()计算数据的摘要统计数据。然后我使用这些结果来计算使用(均值模式)/ sd的偏度。 每个样品具有对照样品,例如A02的对照样品是A01。我还想使用控制模式计算偏度,即(mean-ctrlmode)/ sd。为了实现这一点,我需要查找并将控制样本的模式返回到结果表中。我已经在下面指出了我被困住的地方。非常感谢你的帮助!

#Data set
data <- data.frame(CAG = c(13, 14, 15, 17), 
               A01 = c(6485,35,132, 12), 
               A02 = c(0,42,56, 4),
               A03 = c(33,5014,2221, 18),
               A04 = c(106,89,436, 11))

settings <- data.frame(samples = c('A01', 'A02', 'A03', 'A04'),
                   control = c('A01', 'A01', 'A03', 'A03')) 

#Mode
samplemode <- data.frame(samples = c('A01', 'A02', 'A03', 'A04'),
               samplemode = (data[sapply(data[2:ncol(data)], which.max), ]$CAG))

#Summary statistics
sumstats <- sapply(data[, 2:ncol(data)], function(x) {
  data_e <- rep(data$CAG, x)
  library(psych)
  data.frame(
    describe(data_e)
  )
})

sumstats <- as.data.frame(t(sumstats))

sumstats[] <- lapply(sumstats, function(x) {
  as.numeric(x)
})

# Results table
results <- data.frame(samples = settings$samples, 
                  samplemode = samplemode$samplemode, 
                  control = settings$control, 
                 ctrlmode = samplemode$samplemode[results$controls =     samplemode$samples], #THIS IS WHERE I'M HAVING TROUBLE
                  sumstats)


# Skewness
results$skewmode <- (results$mean - results$samplemode) / results$sd
results$skewctrlmode <- (results$mean - results$ctrlmode) / results$sd

#Expected results
expected <- data.frame(samples = settings$samples,
                   skewmode = c(0.1565726, -0.4903837, 0.6321606, -0.5270822), 
                   skewctrlmode = c(0.1565726, 2.4519186, 0.6321606, 0.6857736))

1 个答案:

答案 0 :(得分:2)

这应该有效:

results <- data.frame(samples = settings$samples, 
                      samplemode = samplemode$samplemode, 
                      control = settings$control, 
                      ctrlmode = samplemode$samplemode[match(settings$control,
                                                             samplemode$samples)],
                      sumstats)

results$skewmode <- (results$mean - results$samplemode) / results$sd
results$skewctrlmode <- (results$mean - results$ctrlmode) / results$sd

    samples samplemode control ctrlmode vars    n     mean        sd median  trimmed mad min max range       skew   kurtosis
A01     A01         13     A01       13    1 6664 13.05207 0.3325666     13 13.00000   0  13  17     4  7.1106921 56.4222321
A02     A02         15     A01       13    1  102 14.66667 0.6797398     15 14.60976   0  14  17     3  1.2624977  2.8577171
A03     A03         14     A03       14    1 7286 14.30771 0.4867646     14 14.25918   0  13  17     4  1.0332600  0.9050386
A04     A04         15     A03       14    1  642 14.56542 0.8245004     15 14.66342   0  13  17     4 -0.6341769  0.5311286
             se   skewmode skewctrlmode
A01 0.004073907  0.1565726    0.1565726
A02 0.067304270 -0.4903837    2.4519186
A03 0.005702620  0.6321606    0.6321606
A04 0.032540433 -0.5270822    0.6857736