Question

参考此问题Finding groups of values from two colums which have entries in common using SQLite

我在TCL试了一下，但是我在某处的循环中迷路了：

set MyList [ list 50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }]

set AllGroups [list]

 dict for {k v} $MyList {
   set Group $k       
   foreach {N1 N2} $v {break}

            dict for {k2 v2} $MyList {
                  foreach {N1_2 N2_2} $v2 {break}
                  if { $N1 == $N1_2 } {                   
                      append Group $k2
                    }
                 if { $N1 == $N2_2 } {                   
                     append Group $k2
                   }    
          }
    lappend AllGroups $Group
}

输出结果为：

50 3440 78 4540 39 4040

这似乎是一个有希望的解决方案的开端。我认为循环看起来正确，我的错误在哪里？任何帮助表示赞赏。也许我应该使用结构？

Answer 1

这个问题看起来很简单，但实际上很难做到。因此，相当长的解决方案。这个特殊的问题已经被研究了很多，算法可以在网上找到，但当然我必须以艰难的方式去做，并提出我自己的实现。这意味着虽然它对我尝试过的数据工作正常，但它可能效率低下并且可能仍然包含错误。在CS的意义上，我认为这是一个相当“天真”的解决方案是公平的。

（在研究这个解决方案的过程中，我发现我已经脱离了当前的计算机科学术语（我还没有真正进入CS二十多年），这没有用。我拿起了术语“最大公共子图”来描述我正在寻找的东西，但现在它似乎实际上有些微妙的不同。好吧，正如我所说的，我放弃了尝试使用已建立的算法并且无论如何都推出了自己的算法。）

问题有一组 EID （CS-speak： vertices ），每个都有两个节点; EID之间共享的节点在它们之间形成直接连接（ edge ），对象是找到束（CS-speak：不 cliques ，可能不是最大公共子图，可能是传递闭包）具有直接和间接连接的EID。

为了使解决方案易于处理，我将流程分为几个步骤：

找到连接列表（每个连接都是一个列表通过共享节点连接的两个或多个EID，或者a 单一，未连接，EID）
建立连接词典密钥是EID，值是它们出现的EID列表与（通过一个或多个节点）直接连接 - 请注意此时的一些值列表可能已经成为那个EID，而大多数只是这些束的子集。
最后，建立密钥单调增加的字典整数（即我编号的项目）和值列表 EIDs形成“束”。

我在执行它的命令旁边描述了每一步。

proc main table {
    # This command puts all the processing steps together. The table 
    # is set up at the bottom of the page.

    puts [set data [makedatadictionary $table]]

    puts [set connections [findconnections $data]]

    puts [set connectionsdict \
        [makeconnectionsdict [dict keys $data] $connections]]

    set bunchdict [makebunchdict $connectionsdict]

    puts "\nCF EIDs\n-----------"
    dict for {cf EIDs} $bunchdict {
        puts "$cf  $EIDs"
    }
}

这是构造束字典的命令。它处理输入字典中的每个键，并通过递归查看其值列表中的每个EID来收集直接或间接连接到它的EID。这里（一个非常非常明显的）陷阱是子图中的每个EID都会产生相同的收集EID列表（尽管可能在不同的排序顺序中），所以在添加之前我们必须检查子图是否已经在字典中它

proc makebunchdict connectionsdict {
    # Given a connections dictionary containing EID keys and EID 
    # tokens representing directly connected EIDs, this command 
    # picks out bunches of EIDs, directly or indirectly connected.
    set result [dict create]
    set n 0
    dict for {key -} $connectionsdict {
        set collected [list]

        recursivelycollect $key $connectionsdict collected

        set collected [lsort $collected]
        if {$collected ni [dict values $result]} {
            dict set result [incr n] $collected
        }
    }
    set result
}

这是以递归方式访问每个EID密钥的命令。当它找到的每个EID已经在收集的EID列表中时停止。

proc recursivelycollect {key connectionsdict varName} {
    # Recursively visits every EID in a directly connected 
    # group, saving unique EIDs in a variable that lives in 
    # the original caller's stack frame.
    upvar 1 $varName collected
    lappend collected $key
    foreach n [dict get $connectionsdict $key] {
        if {$n ni $collected} {
            recursivelycollect $n $connectionsdict collected
        }
    }
}

这是设置连接字典的命令。它非常简单：对于每个键，它构建一个列表，该列表是键出现的所有列表的列表并集。然后它将每个结果列表减少为唯一成员。

proc makeconnectionsdict {keys connections} {
    # Given a set of keys which are EID tokens, and a list of lists 
    # containing directly connected EIDs, this command constructs a 
    # dictionary with the EID tokens as keys and the lists of every 
    # direct connection set that the EID appears in as values. Note 
    # that it's very likely that
    #   [dict values $connections] != [dict values $result]
    # since the list of connections has lists of EIDs connected by a
    # single node, while the result list here has EIDs connected by 
    # one or more nodes.
    set result [dict create]
    foreach key $keys {
        foreach connection $connections {
            if {$key in $connection} {
                dict lappend result $key {*}$connection
            }
        }
        dict set result $key [lsort -unique [dict get $result $key]]
    }
    set result
}

这是找出哪些EID彼此连接的命令。它非常简单直接：它基本上只是输入字典的反转。我最后删除了最明显的副本。

proc findconnections data {
    # This command discovers direct connections between keys in the 
    # dictionary which is passed to it. A direct connection exists 
    # between two keys if they share any members of their value lists. 
    # E.g. 
    #   a {b c}  and  d {e c}  are directly connected, but
    #   a {b c}  and  f {g h}  are not.
    #
    # The result is a list of lists, where each sublist either contains 
    #  * two or more keys: these keys are connected to each other by a 
    #    single value list member, or
    #  * a single key: these keys have no connections at all.
    set result [dict create]
    dict for {key value} $data {
        foreach val $value {
            dict lappend result $val $key
        }
    }
    # Return only the values from the result dictionary, and only 
    # trivially unique values at that.
    lsort -unique [dict values $result]
}

这是将EID /节点/节点数据表简单地转换为字典的命令。这只是一个方便的命令，让我以更可行的格式定义输入。

proc makedatadictionary table {
    # Convert a N x 3 table to a dictionary of N items where 
    # the key is the value in column 1 and its value is the 
    # list of the values in column 2 and 3.
    set data [dict create]
    foreach {col1 col2 col3} $table {
        dict set data $col1 [list $col2 $col3]
    }
    set data
}

这就是你如何开始的。该参数由表示第一列中的EID令牌和第二列和第三列中的节点号的数据组成。实际值不会影响此代码的工作方式，但这些值都不应该是列表。

（在这个例子中，EID 50-40来自OP并且可能是真实数据，其余的由我组成以测试解决方案。）

（注意：评论中提到的'Hoodiecrow'是我，我之前使用过那个昵称。）

Answer 2

我不确定我是否理解你的问题，但这可能有所帮助。我基本上使用“节点”（你的帖子中提到的SQLite问题）作为数组键，并将所有EID附加到由“node”命名的数组元素 - ＆gt; [list EID1 {node1 node2} EID2 {node3 node4}]等。

set l [ list 50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }]
puts $l
foreach {item nodes} $l {
    foreach node $nodes {
        lappend n($node) $item
    }
}
foreach {group items} [array get n] {
    puts "Group: $group Items: $items"
}

50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }
Group: 45 Items: 45
Group: 9 Items: 78 39
Group: 23 Items: 50
Group: 2 Items: 45 40
Group: 11 Items: 34
Group: 6 Items: 34 40
Group: 12 Items: 39
Group: 25 Items: 50 78

对字典键进行分组，其中值（子列表）具有共同的条目

2 个答案: