R:如何将来自多个有序因子的数据汇总到一个变量中

时间:2014-08-18 18:21:40

标签: r data-manipulation categorical-data ordinal

我有数据代表不同条件下患者哮喘症状的严重程度。严重性变量是有序因子,都具有相同的水平(温和<中等<严重)。这是一个简化的例子:

# Create example data frame
df <- data.frame(
  ID = c(1:5),
  Daytime = c("Mild", "Severe", "Mild", "Moderate", "Moderate"), # severity of daytime symptoms
  Sleep = c("Moderate", NA, "Mild", "Mild", "Moderate"), # severity of nighttime symptoms
  Activity = c("Mild", "Moderate", "Mild", "Moderate", "Severe") # severity of symptoms during activity
  )

# Specify order of factor levels
df$Daytime <- ordered(
  df$Daytime,
  levels = c("Mild",
             "Moderate",
             "Severe")
  )
df$Sleep <- ordered(
  df$Sleep,
  levels = c("Mild",
             "Moderate",
             "Severe")
  )
df$Activity <- ordered(
  df$Activity,
  levels = c("Mild",
             "Moderate",
             "Severe")
)

df

结果数据框如下所示:

  ID  Daytime    Sleep Activity
1  1     Mild Moderate     Mild
2  2   Severe     <NA> Moderate
3  3     Mild     Mild     Mild
4  4 Moderate     Mild Moderate
5  5 Moderate Moderate   Severe

我正在尝试创建一个“整体严重程度”变量,其中患者的总体严重程度=三种类别(白天,睡眠和活动)中报告的最严重症状。也就是说,“整体”等于“白天”,“睡眠”和“活动”的最高级别。结果如下:

  ID  Daytime    Sleep Activity  Overall
1  1     Mild Moderate     Mild Moderate
2  2   Severe     <NA> Moderate   Severe
3  3     Mild     Mild     Mild     Mild
4  4 Moderate     Mild Moderate Moderate
5  5 Moderate Moderate   Severe   Severe

我想在不写一些大而笨重的for循环的情况下这样做,但我无法弄清楚如何。我想也许我可以用ave()来做,但似乎不能同时处理多个变量:

> df$Overall <- ave(c(df$Daytime, df$Sleep, df$Activity),
+                 df$ID,
+                 FUN = function(i) max (i, na.rm=T)
+                 )
Error in `$<-.data.frame`(`*tmp*`, "Worst", value = c(2L, 3L, 1L, 2L,  : 
  replacement has 15 rows, data has 5

是否有可以执行此操作的应用功能?

1 个答案:

答案 0 :(得分:4)

这样做的一个简单方法是:

df$Overall <- apply(df[,2:4], 1, max, na.rm=T)