 Id   Gender   Col_Cold_1  Col_Cold_2  Col_Cold_3  Col_Hot_1  Col_Hot_2   Col_Hot_3  
 10   F         pain       sleep        NA         infection  medication  walking
 14   F         Bump       NA           muscle     NA         twitching   flutter
 17   M                    pain         hemoloma   Callus     infection   
 18   F         muscle                  pain                  twitching   medication


1) All values in columns with keyword Cold will contribute to the rows  
2) All values in columns with keyword Hot will contribute to the columns

例如,pain, sleep, Bump, muscle, hemaloma是关键字列下的单元格值,它们将形成行,而infection, medication, Callus, walking, twitching, flutter等单元格值位于关键字热,这将形成关联矩阵的列。


           infection  medication  walking  twitching  flutter  Callus
     pain  2          2           1        1                   1
    sleep  1          1           1
     Bump                                  1          1
   muscle             1                    1
 hemaloma  1                                                   1
  • [pain, infection] = 2因为疼痛和感染之间的关联在原始数据框中出现两次:一次在第1行,第二次在第3行。

  • [pain, medication] = 2因为疼痛和药物之间的关联在第1行和第4行再次出现两次。



df = structure(list(id = c(10, 14, 17, 18), Gender = structure(c(1L, 1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), Col_Cold_1 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "Bump", "muscle", "pain"), class = "factor"), Col_Cold_2 = structure(c(4L, 2L, 3L, 1L), .Label = c("", "NA", "pain", "sleep"), class = "factor"), Col_Cold_3 = structure(c(1L, 3L, 2L, 4L), .Label = c("NA", "hemaloma", "muscle", "pain" ), class = "factor"), Col_Hot_1 = structure(c(4L, 3L, 2L, 1L), .Label = c("", "Callus", "NA", "infection"), class = "factor"), Col_Hot_2 = structure(c(2L, 3L, 1L, 3L), .Label = c("infection", "medication", "twitching"), class = "factor"), Col_Hot_3 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "flutter", "medication", "walking" ), class = "factor")), .Names = c("id", "Gender", "Col_Cold_1", "Col_Cold_2", "Col_Cold_3", "Col_Hot_1", "Col_Hot_2", "Col_Hot_3" ), row.names = c(NA, -4L), class = "data.frame")

df[] <- lapply(df, as.character)  # Convert factors to characters
df[df == "NA" | df == "" | is.na(df)] <- NA  # Make all blanks NAs


out <- do.call(rbind, sapply(grep("^Col_Cold", names(df), value = T), function(x){
  vars <- c(x, grep("^Col_Hot", names(df), value = T))
  setNames(gather_(select(df, one_of(vars)), 
    key_col = x,
    value_col = "value",
    gather_cols = vars[-1])[, c(1, 3)], c("cold", "hot"))
}, simplify = FALSE))

这个想法是将每个“冷”列与每个“热”列“配对”以形成一个长数据集。 out看起来像这样:

#        cold        hot
# 1      pain  infection
# 2      Bump       <NA>
# 3      <NA>     Callus
# 4    muscle       <NA>
# 5      pain medication
# ...


xtabs(~ cold + hot, na.omit(out))
#           hot
# cold       Callus flutter infection medication twitching walking
#   Bump          0       1         0          0         1       0
#   hemaloma      1       0         1          0         0       0
#   muscle        0       1         0          1         2       0
#   pain          1       0         2          2         1       1
#   sleep         0       0         1          1         0       1