R中的ridit变换序数变量

时间:2016-09-12 08:36:44

标签: r numeric feature-extraction

Ridit评分(https://en.wikipedia.org/wiki/Ridit_scoring)通常用于转换序数变量ino相对频率(低于给定值的案例比例,再加上该值比例的一半)。

您将如何在R中执行此操作?

2 个答案:

答案 0 :(得分:1)

以下包可能会解决您的问题。特别是命令Ridit::ridit很有用,因为它按以下方式描述。

Kruskal-Wallis测试的扩展,允许指定任意参考组。还提供 每组的平均Ridit。组的平均Ridit是随机观察的概率估计 从该组开始,将大于或等于参考组的随机观察结果。

https://cran.r-project.org/web/packages/Ridit/Ridit.pdf

另一种方法是使用像Probit,Logit或Exact Logit这样的二元选择模型,并提取预测的自变量,即0或1。

答案 1 :(得分:1)

进一步更新 这些以及其他一些功能现在可以在CRAN包ridittools中使用,由您自己维护。

<强>更新 删除涉及构建转换矩阵的相当愚蠢的代码,我忘记了cumsum()

# Convert vector of counts to ridits

to.ridit <- function(v) {
  (cumsum(v) - .5 * v) / sum(v)
}

# Calculate mean ridit for vector of counts relative to reference group

mean.ridit <- function(v, ref) {
  sum(to.ridit(ref) * v ) / sum(v)
}

# Calculate mean ridits for several groups
# x is matrix of counts
# margin is 1 for groups in rows, 2 for groups in columns
# If ref is omitted, totals across groups are used as reference group
# If ref is a vector of counts, it's used as reference group
# Otherwise, ref is the number (or name if it exists) of the group to use as reference

ridits <- function(x, margin, ref=NULL) {
  if (length(ref) > 1) {
    refgroup <- ref
  } else if (length(ref) == 1) {
    if (margin==1) {
      refgroup <- x[ref,]
    } else {
      refgroup <- x[, ref]
    }
  } else {
    refgroup <- apply(x, 3-margin, sum)
  }
  apply(x, margin, mean.ridit, refgroup)
}

示例(Fleiss,1981:车祸的严重程度):

to.ridit(c(17, 54, 60, 19, 9, 6, 14))

[1] 0.04748603 0.24581006 0.56424581 0.78491620 0.86312849 0.90502793 0.96089385

注意 虽然我的代码比另一个答案中提到的Ridit :: ridit包的灵活性稍差,但它看起来要快一点:

# Influenza subtypes by age as of week ending 2/24/18 (US CDC)

> flu.age
        BY  BV  BU   H3   H1
0-4    274  91  92 1808  500
5-24  1504 274 698 5090  951
25-64 1665 101 567 7538 1493
65+   1476  35 330 9541  515

# Using CRAN package

> system.time(ridit(flu.age,2))
   user  system elapsed 
  3.746   0.007   3.756 

# Using my code

> system.time(ridits(flu.age,2))
   user  system elapsed 
  0.001   0.000   0.000