Question

我想将下面显示的数据转换为邻接矩阵，以便将其用于网络分析。

变量如下：

ccode1：国家1的州号
ccode2：国家2的州号
year-观察年
DR_at_1-第2边第1边的外交代表级别（请参见下文）
DR_at_2-第1边第2边的外交代表级别（请参见下文）
DE-第1面和第2面之间的任何外交往来（见下文）
version-数据集的当前版本

# A tibble: 6 x 7
  ccode1 ccode2  year DR_at_1 DR_at_2    DE version
   <dbl>  <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1      2     20  1920       0       0     0   2006.
2      2     20  1925       0       0     0   2006.
3      2     20  1930       0       2     1   2006.
4      2     20  1935       2       2     1   2006.
5      2     20  1940       2       2     1   2006.
6      2     20  1950       9       9     1   2006.

我想要以下内容：

                         country1
country2   1'    2'      3'       4'      5'    6'    
1          0     1       0        0       0     0   
2          1     0       1        0       0     0   
3          0     0       0        0       1     1   
4          1     0       1        0       1     1   
5          0     1       0        0       0     1   
6          0     0       1        0       1     0

请注意，此表的值是假设的。

我是R的新手，这就是为什么我对在这种情况下如何对待'year'变量感到困惑。我的直觉是，邻接表应该每年单独构建，但是我愿意接受其他建议。

邻接表的值应基于country2 / 1是否在country1 / 2中有外交代表权（DR_at_1 > 0或DR_at_2 > 0）。

我使用的数据可以可复制的方式位于以下网址：http://www.correlatesofwar.org/data-sets/diplomatic-exchange

提前谢谢！

Answer 1

方法

鉴于您已经拥有的数据格式，这并不是一项艰巨的任务。以下是使用特殊矩阵索引格式的示例-当通过提供另一个矩阵（索引矩阵）从矩阵中选择元素时，该矩阵指定了（行，列）对。

要更好地理解这种索引格式，请阅读help("[")：

...
A third form of indexing is via a numeric matrix with the one
column for each dimension: each row of the index matrix then
selects a single element of the array, and the result is a vector.
...

示例

给出玩具数据集：

df <- data.frame(code1=1:6, code2=c(2,3,2,2,6,1), year=1990+1:6,
                 DR_at_1=c(0,0,0,2,2,9), DR_at_2=c(0,0,2,2,2,9))

df
  code1 code2 year DR_at_1 DR_at_2
1     1     2 1991       0       0
2     2     3 1992       0       0
3     3     2 1993       0       2
4     4     2 1994       2       2
5     5     6 1995       2       2
6     6     1 1996       9       9

我们可以获得相关边的列表：

edges <- df[df$DR_at_1 > 0 | df$DR_at_2 > 0,]
edges <- cbind(as.character(edges$code1), as.character(edges$code2))
edges <- rbind(edges, edges[,2:1])  # for each edge (u,v) add a symetric edge (v,u)

edges
     [,1] [,2]
[1,] "3"  "2"
[2,] "4"  "2"
[3,] "5"  "6"
[4,] "6"  "1"
[5,] "2"  "3"
[6,] "2"  "4"
[7,] "6"  "5"
[8,] "1"  "6"

首先用国家/地区代码作为行名和列名构造一个空的邻接矩阵：

codes <- unique(c(df$code1, df$code2))  # All available country codes
A <- matrix(0, nrow=length(codes), ncol=length(codes), dimnames=list(codes, codes))

A
  1 2 3 4 5 6
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0

最后将所需的边添加到矩阵中：

A[edges] <- 1

A
  1 2 3 4 5 6
1 0 0 0 0 0 1
2 0 0 1 1 0 0
3 0 1 0 0 0 0
4 0 1 0 0 0 0
5 0 0 0 0 0 1
6 1 0 0 0 1 0

年份变量

通常，有关如何处理年份变量的问题与问题的背景有关，而不与编程有关。您应该根据有关问题的先前信息来决定。

然后，如果您想按年份创建一个单独的邻接矩阵，请在边缘选择阶段添加另一个过滤步骤：

# Get a list of edges
edges <- df[(df$DR_at_1 > 0 | df$DR_at_2 > 0) & df$year == 1990,]

将data.frame转换为邻接矩阵以进行网络分析（R）

1 个答案:

方法

示例

年份变量