在矩阵中查找公共链接并按公共交叉点进行分类

时间:2017-05-13 18:23:39

标签: r matrix stata graph-theory set-intersection

假设我有一个距离成本矩阵,其中命运成本和原始成本都需要低于某个阈值量 - 比如US 100 - 来共享一个链接。我的困难在于在对这些地区进行分类后实现一个共同的集合:A1链接(命运成本和低于阈值的原点)与A2和(同样的)A3和A4; A2链接A1和A4; A4链接A1和A2。因此,A1,A2和A4将被归类为同一组,因为它们之间的链接频率最高。下面我设置一个矩阵作为例子:

    A1  A2  A3  A4  A5  A6  A7
A1  0   90  90  90  100 100 100
A2  80  0   90  90  90  110 100
A3  80  110 0   90  120 110 90
A4  90  90  110 0   90  100 90
A5  110 110 110 110 0   90  80
A6  120 130 135 100 90  0   90
A7  105 110 120 90  90  90  0

我正在使用Stata对此进行编程,并且我没有将矩阵以矩阵形式放在上面,就像在mata中一样。列出字母A加上数字的列是具有矩阵的rownames的变量,其余列用每个地点名称命名(例如A1等)。

我已经使用以下代码返回了每个地点之间的链接列表,这可能是因为我匆忙所以我非常“粗暴地”做了:

    clear all

    set more off

    //inputting matrix

    input A1 A2 A3 A4 A5 A6 A7
    0 90 90 90 100 100 100
    80 0 90 90 90 100 100
    80 110 0 90 120 110 90
    90 90 110 0 90 100 90
    110 110 110 110 0 90 90
    120 130 135 100 90 0 90
    105 110 120 90 90 90 0

    end

    //generate row variable

    gen locality=""

    forv i=1/7{

        replace locality="A`i'" in `i'

    }
    *

    order locality, first


    //generating who gets below the threshold of 100

    forv i=1/7{

        gen r_`i'=0

        replace r_`i'=1 if A`i'<100 & A`i'!=0

    }
    *

    //checking if both ways (origin and destiny below threshold)

    forv i=1/7{

        gen check_`i'=.

    forv j=1/7{

            local v=r_`i'[`j']

            local vv=r_`j'[`i']

            replace check_`i'=`v'+`vv' in `j'

                }

    *
        }
    *

    //creating list of links

    gen locality_x=""

    forv i=1/7{

        preserve

        local name = locality[`i']

        keep if check_`i'==2

        replace locality_x="`name'"

        keep locality locality_x

        save "C:\Users\user\Desktop\temp_`i'", replace

        restore

    }
    *

    use "C:\Users\user\Desktop\temp_1", clear

    forv i=2/7{

        append using "C:\Users\user\Desktop\temp_`i'"
    }
    *

    //now locality_x lists if A.1 has links with A.2, A.3 etc. and so on.
    //the dificulty lies in finding a common intersection between the groups.

返回以下列表:

locality_x  locality
A1  A2
A1  A3
A1  A4
A2  A1
A2  A4
A3  A1
A4  A1
A4  A2
A4  A7
A5  A6
A5  A7
A6  A5
A6  A7
A7  A4
A7  A5
A7  A6

我试图熟悉set-intersection,但我不知道如何在Stata中做到这一点。我想做一些我可以重新编程阈值并找到共同集的东西。如果你能在R中生成一个解决方案,我会很感激,因为我可以在其中编程。

获取R中列表的类似方法(如下面的答案中的@ user2957945):

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))

# get values less than threshold
id = m < 100 
# make sure both values are less than threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 

   Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6     

我还添加了“图论”标签,因为我认为这不是一个交叉问题,我可以在矢量中转换列表并使用R中的intersect函数。代码需要生成一个新的ID,其中某些地方必须位于相同的新ID(组)中。如上例所示,如果A.1的集合有A.2和A.4,A.2有A.1和A.4,A.4有A.1和A.2,这三个地点< strong>必须在同一个ID(组)。 换句话说,我需要每个地区最大的交叉路口分组。我理解不同矩阵可能存在问题,例如A.1有A.2和A.6,A.2有A.1和A.6,A.6有A.1和A.2(但是考虑到上面的第一个例子,A.6没有A.4。在这种情况下,我欢迎将A.6添加到分组或其他任意一个的解决方案,其中代码只将第一组分组在一起,从列表中删除A.1,A.2和A.4,以及离开A.6没有新的分组。

2 个答案:

答案 0 :(得分:2)

在R中你可以做到

# get values less then threshold
id = m < 100 
# make sure both values are less then threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 

   Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))

答案 1 :(得分:1)

假设你想要的是最大的完整子图,你可以使用igraph包:

# Load necessary libraries
library(igraph)

# Define global parameters
threshold <- 100

# Compute the adjacency matrix
# (distances in both directions need to be smaller than the threshold)
am <- m < threshold & t(m) < threshold

# Make an undirected graph given the adjacency matrix
# (we set diag to FALSE so as not to draw links from a vertex to itself)
gr <- graph_from_adjacency_matrix(am, mode = "undirected", diag = FALSE)

# Find all the largest complete subgraphs
lc <- largest_cliques(gr)

# Output the list of complete subgraphs as a list of vertex names
lapply(lc, (function (e) e$name))

据我所知,Stata没有类似的功能。但是,如果您正在寻找最大的连接子图(在您的情况下是整个图),那么您可以在Stata中使用聚类命令(即clustermat)。