从R中的数据框创建连接矩阵

时间:2015-02-13 14:07:24

标签: r

我有这种形式的一些数据:

> agreers <- read.csv('agreers.csv')
> attach(agreers)
> head(agreers)
        wain1       wain2 count
1   Founder36 Mnist10_269   673
2    Founder3  Mnist10_19   665
3 Mnist10_140 Mnist10_257   663
4    Founder1   Founder15   659
5   Founder21   Founder25   654
6   Founder15   Founder32   654

我创建的数据为wain1 <= wain2,因此每对只出现一次。所以这将是一个无向图。

我想创建一个连接矩阵,如下所示:

          Mnist10_269 Mnist10_19 Mnist10_257 . . .
Founder36    673           ?          ?
Founder3       ?         665          ?
Mnist10_140    ?           ?        663
  . . .

如果agreers中没有任何数据,则?'将为零。所以这就是我尝试过的:

> mat = matrix(0, nrow = length(unique(wain1)), ncol = length(unique(wain2)))
> rownames(mat) = unique(wain1)
> colnames(mat) = unique(wain2)
> for(i in as.integer(rownames(agreers))) mat[wain1[i], wain2[i]] = count[i]

某事,即mat更新了数字,但数字不在正确的位置!例如,我希望这会返回673

> mat["Founder36","Mnist10_269"]
[1] 0

编辑:这里有一些数据文件,以显示“因素中的重复级别”问题。请注意,Mnist10_140在第一列中出现两次,但在第二列中出现不同的值。

wain1,wain2,count
Founder36,Mnist10_269,673
Founder3,Mnist10_19,665
Mnist10_140,Mnist10_257,663
Founder1,Founder15,659
Founder21,Founder25,654
Founder15,Founder32,654
Mnist10_140,Mnist10_84,643

当处理该数据子集时,我收到警告:

> agreers <- read.csv('temp.csv')
> connections <- xtabs(count ~ factor(wain1, levels = wain1) + factor(wain2, levels = wain2), agreers)
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

3 个答案:

答案 0 :(得分:4)

如果您喜欢基地R,可以使用table

df <- read.table(header=TRUE, text='   wain1       wain2 count
   Founder36 Mnist10_269   673
    Founder3  Mnist10_19   665
 Mnist10_140 Mnist10_257   663
    Founder1   Founder15   659
   Founder21   Founder25   654
   Founder15   Founder32   654')

tab <- with(df,table(factor(wain1, levels=unique(wain1)),
                   factor(wain2, levels=unique(wain2))))
tab[which(tab == 1)] = df$count
tab

              Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
  Founder36           673          0           0         0         0         0
  Founder3              0        665           0         0         0         0
  Mnist10_140           0          0         663         0         0         0
  Founder1              0          0           0       659         0         0
  Founder21             0          0           0         0       654         0
  Founder15             0          0           0         0         0       654

修改

正如@DavidArenburg建议的那样,您也可以使用xtabs

xtabs(count ~ factor(wain1, levels = unique(wain1)) + factor(wain2, levels = unique(wain2)), df)

答案 1 :(得分:1)

查看包reshape2

library(reshape2)
agreers <- read.table(header = TRUE, stringsAsFactors = FALSE, sep = ',', text = "wain1,wain2,count\nFounder36,Mnist10_269,673\nFounder3,Mnist10_19,665\nMnist10_140,Mnist10_257,663\nFounder1,Founder15,659\nFounder21,Founder25,654\nFounder15,Founder32,654\n")
conMat <- dcast(agreers, wain1 ~ wain2, fill = 0)
rownames(conMat) <- conMat$wain1
conMat$wain1 <- NULL

conMat["Founder36","Mnist10_269"]

那应该可以解决问题。

修改 这不会导致排序数据。请改为查看@cdeterman解决方案

答案 2 :(得分:1)

以下是@ cdeterman方法的变体(来自同一帖子的df

 do.call(table, lapply(df[1:2], function(x) 
            factor(x, levels=unique(x))))*df[,3]
 #              wain2
 # wain1         Mnist10_269 Mnist10_19 Mnist10_257 Founder15 Founder25 Founder32
 # Founder36           673          0           0         0         0         0
 # Founder3              0        665           0         0         0         0
 # Mnist10_140           0          0         663         0         0         0
 # Founder1              0          0           0       659         0         0
 # Founder21             0          0           0         0       654         0
 # Founder15             0          0           0         0         0       654