Question

由于我是一个新的R用户，我正在努力编写一个循环或类似的东西来解决以下问题。

所以问题：我列出了200名患者和30种症状，信息症状呈现是（1）或否（0）。我想创建一个类似于相关矩阵的矩阵，对于每对症状都告诉我两个症状在同一个体中的百分比。

所以：

ID /sym 1 / sym 2 / sym 3 /....
Pat 1 / 1 / 1 / 0 / ...    
Pat 2 / 1 / 1 / 1 /
Pat 3 / 1 / 1 / 0 / 
...

然后我想使用看起来像标准相关矩阵的矩阵，但是对于每个症状对都显示两者的百分比，因为我认为关联这些分类值没有任何意义。

到目前为止，我已经开始为每个组合编写一个新变量，告诉我两者是否存在，然后使用它来计算百分比并填充我的矩阵。

不幸的是，考虑到可能的组合数量，需要FOREVER写入，但我无法弄清楚如何迭代它。也许你可以帮忙吗？它可能非常容易，我只是不足以让程序员想到它。

Answer 1

设x为200乘30的数据数组。根据矩阵乘法的规则，x＆＃39; * x是一个30乘30的数组，其（i，j）项计算列i和j等于1的位数。将这些计数除以200给出比例，并乘以100将其转换为所需的百分比。

以下是单行实现以及用于测试和说明它的代码。如果问题比您的问题大10,000倍（20,000名患者和3000名症状），则需要大约10秒钟才能在此机器上执行。由于它与大小几乎呈线性关系，因此对问题的计算需要大约一毫秒。

#
# For columns i and j, f(x)[i,j] is the percentage of rows in which
# both columns of `x` are TRUE (optionally: nonzero).
#
f <- function(x) (t(x) %*% x) * (100 / dim(x)[1])
#
# Slow version to demonstrate `f` is correct.
#
f.direct <- function(x) {
  m <- dim(x)[1]
  n <- dim(x)[2]
  #
  # Test all elements of `x` to create a logical array.
  #
  x.indicator <- x != 0
  #
  # Initialize the result.
  #
  y <- matrix(NA_real_, n, n)
  #
  # Loop over pairs of columns.
  #
  for (i in 1:n) {
    for (j in 1:n) {
      # Compare column `i` to column `j` by averaging the times their
      # indicators are equal.  Multiply by 100 to give a percentage.
      y[i,j] <- 100 * mean(x.indicator[,i] & x.indicator[,j])
    }
  }
  return(y)
}
#------------------------------------------------------------------------------#
#
# Create some data and test `f` on them.
#
m <- 200  # Number of rows
n <- 30   # Number of columns
p <- 0.1  # Expected proportion of 1's
x <- matrix(runif(m*n) < p, m, n)

system.time(y <- f(x))                # Almost instantaneous
system.time(y.direct <- f.direct(x))  # A thousand times slower (but not bad)
#
# Display the results.
#
par(mfrow=c(1,2))
image(y, main="Matrix Result")
image(y.direct, main="Direct Result")
par(mfrow=c(1,1))
#
# Compare them and report the outcome.
#
if(all.equal(y, y.direct)) cat("Results are equal.") else cat("There's a difference!")

编写算法以识别症状对

1 个答案: