如何从R和SAS中的两对列获得相关矩阵?对角线为零

时间:2016-05-12 20:14:38

标签: r sas

我的数据框如下所示;我使用R将两列转移到矩阵,但R不能给我矩阵。 (我的预期矩阵大约是700 * 700。)R stoped并显示Reached total allocation of 12213Mb: see help(memory.size)

我想在SAS中做同样的事情。我们怎么做?或者我需要不同的代码才能在R?

中完成此操作
ID_r ID_c SCORE
A1   A2   0.2
A1   A3   0.2
A1   A4   0.3
A1   A5   0.2
A1   A6   0.2
A2   A3   0.6
A2   A4   0.2
A2   A5   0.2
A2   A6   0.2
A3   A4   0.2
A3   A5   0.2
A3   A6   0.2
A4   A5   0.2
A4   A6   0.9
A5   A6   0.2

    ID_r<-c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5')
    ID_c<-c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6')
    SCORE<-c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2)

library(dplyr); library(tidyr)
df$ID_r <- as.character(df$ID_r)
df$ID_c <- as.character(df$ID_c)
ID <- unique(c(df$ID_r, df$ID_c))
diagDf <- data.frame(ID_r = ID, ID_c = ID, SCORE = "0.0")
newDf <- rbind(df, diagDf) %>% arrange(ID_r, ID_c)

resultDf <- spread(newDf, ID_r, SCORE, fill = ".")
names(resultDf)[1] <- ""
resultDf

示例SAS数据如下。

data score_data;
infile datalines;
input ID_r $ ID_c $ SCORE;
return;
datalines;

    A1   A2   0.2
    A1   A3   0.2
    A1   A4   0.3
    A1   A5   0.2
    A1   A6   0.2
    A2   A3   0.6
    A2   A4   0.2
    A2   A5   0.2
    A2   A6   0.2
    A3   A4   0.2
    A3   A5   0.2
    A3   A6   0.2
    A4   A5   0.2
    A4   A6   0.9
    A5   A6   0.2
;
run;

proc print data=score_data ;
run;

我想使用两列数据生成如下的矩阵(diaginal为零)。

    A1  A2  A3  A4  A5  A6
A1 0.0 0.2 0.2 0.3 0.2 0.2
A2 0.2 0.0 0.6 0.2 0.2 0.2 
A3 0.2 0.6 0.0 0.2 0.2 0.2
A4 0.3 0.2 0.2 0.0 0.2 0.9
A5 0.2 0.2 0.2 0.2 0.0 0.2
A6 0.2 0.2 0.2 0.9 0.2 0.0

提前致谢!!

2 个答案:

答案 0 :(得分:2)

R解决方案:

library(plyr)
ID_r = c('A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A4','A4','A5')
ID_c = c('A2','A3','A4','A5','A6','A3','A4','A5','A6','A4','A5','A6','A5','A6','A6')
SCORE = c(0.2,0.2,0.3,0.2,0.2,0.6,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.9,0.2)
df1 = data.frame(ID_r, ID_c, SCORE)
df2 = data.frame(ID_c, ID_r, SCORE)
names(df2) = c("ID_r","ID_c","SCORE")
df = rbind(df1,df2)
ID <- unique(c(ID_r, ID_c))

df1 = expand.grid(ID,ID)
names(df1) = c("ID_r","ID_c")
d = join(df1, df, by = c("ID_r","ID_c"))
d$SCORE[is.na(d$SCORE)] <- 0

a = matrix(0, nrow = length(ID), ncol = length(ID))
rownames(a) <- ID
colnames(a) <- ID
a

b = as.matrix(d)
b

a[b[,1:2]] <- b[,3]
a

答案 1 :(得分:1)

PROC TRANSPOSE是你的朋友。

proc transpose data=score_data out=score_matrix;
  by id_r; 
  id id_c; *this makes variable names;
  var score;
run;

这会给你上面的对角线。第二个proc transpose可以为您提供较低的对角线(交换id_rid_c我想象的),或者您可以在数据集中执行此操作。你仍然需要在数据集中创建六个0.0行,但这不应该特别困难。

这样做的一个例子:

data pre_transpose;
  set score_data end=eof;
  by id_r id_c;
  output;

  *Swap R and C;
  _idtemp = id_r;
  id_r=id_c;
  id_c=_idtemp;
  output;

  *If EOF, then need that last 0,0 combo which never gets an R;
   if eof then do;
    id_c = id_r;
    score=0;
    output;
    id_c = _idtemp;
  end;

  *If first line of a new ID, then need the R=C row;
  if first.id_r then do;
    id_r=id_c;
    score=0;
    output;
  end;

run;

proc sort data=pre_transpose;
  by id_r id_c;
run;
proc transpose data=pre_transpose out=score_matrix;
  by id_r; 
  id id_c; *this makes variable names;
  var score;
run;