参考此问题Finding groups of values from two colums which have entries in common using SQLite
我在TCL试了一下,但是我在某处的循环中迷路了:
set MyList [ list 50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }]
set AllGroups [list]
dict for {k v} $MyList {
set Group $k
foreach {N1 N2} $v {break}
dict for {k2 v2} $MyList {
foreach {N1_2 N2_2} $v2 {break}
if { $N1 == $N1_2 } {
append Group $k2
}
if { $N1 == $N2_2 } {
append Group $k2
}
}
lappend AllGroups $Group
}
输出结果为:
50 3440 78 4540 39 4040
这似乎是一个有希望的解决方案的开端。我认为循环看起来正确,我的错误在哪里?任何帮助表示赞赏。也许我应该使用结构?
答案 0 :(得分:1)
这个问题看起来很简单,但实际上很难做到。因此,相当长的解决方案。这个特殊的问题已经被研究了很多,算法可以在网上找到,但当然我必须以艰难的方式去做,并提出我自己的实现。这意味着虽然它对我尝试过的数据工作正常,但它可能效率低下并且可能仍然包含错误。在CS的意义上,我认为这是一个相当“天真”的解决方案是公平的。
(在研究这个解决方案的过程中,我发现我已经脱离了当前的计算机科学术语(我还没有真正进入CS二十多年),这没有用。我拿起了术语“最大公共子图”来描述我正在寻找的东西,但现在它似乎实际上有些微妙的不同。好吧,正如我所说的,我放弃了尝试使用已建立的算法并且无论如何都推出了自己的算法。)
问题有一组 EID (CS-speak: vertices ),每个都有两个节点; EID之间共享的节点在它们之间形成直接连接( edge ),对象是找到束(CS-speak:不 cliques ,可能不是最大公共子图,可能是传递闭包)具有直接和间接连接的EID。
为了使解决方案易于处理,我将流程分为几个步骤:
我在执行它的命令旁边描述了每一步。
proc main table {
# This command puts all the processing steps together. The table
# is set up at the bottom of the page.
puts [set data [makedatadictionary $table]]
puts [set connections [findconnections $data]]
puts [set connectionsdict \
[makeconnectionsdict [dict keys $data] $connections]]
set bunchdict [makebunchdict $connectionsdict]
puts "\nCF EIDs\n-----------"
dict for {cf EIDs} $bunchdict {
puts "$cf $EIDs"
}
}
这是构造束字典的命令。它处理输入字典中的每个键,并通过递归查看其值列表中的每个EID来收集直接或间接连接到它的EID。这里(一个非常非常明显的)陷阱是子图中的每个EID都会产生相同的收集EID列表(尽管可能在不同的排序顺序中),所以在添加之前我们必须检查子图是否已经在字典中它
proc makebunchdict connectionsdict {
# Given a connections dictionary containing EID keys and EID
# tokens representing directly connected EIDs, this command
# picks out bunches of EIDs, directly or indirectly connected.
set result [dict create]
set n 0
dict for {key -} $connectionsdict {
set collected [list]
recursivelycollect $key $connectionsdict collected
set collected [lsort $collected]
if {$collected ni [dict values $result]} {
dict set result [incr n] $collected
}
}
set result
}
这是以递归方式访问每个EID密钥的命令。当它找到的每个EID已经在收集的EID列表中时停止。
proc recursivelycollect {key connectionsdict varName} {
# Recursively visits every EID in a directly connected
# group, saving unique EIDs in a variable that lives in
# the original caller's stack frame.
upvar 1 $varName collected
lappend collected $key
foreach n [dict get $connectionsdict $key] {
if {$n ni $collected} {
recursivelycollect $n $connectionsdict collected
}
}
}
这是设置连接字典的命令。它非常简单:对于每个键,它构建一个列表,该列表是键出现的所有列表的列表并集。然后它将每个结果列表减少为唯一成员。
proc makeconnectionsdict {keys connections} {
# Given a set of keys which are EID tokens, and a list of lists
# containing directly connected EIDs, this command constructs a
# dictionary with the EID tokens as keys and the lists of every
# direct connection set that the EID appears in as values. Note
# that it's very likely that
# [dict values $connections] != [dict values $result]
# since the list of connections has lists of EIDs connected by a
# single node, while the result list here has EIDs connected by
# one or more nodes.
set result [dict create]
foreach key $keys {
foreach connection $connections {
if {$key in $connection} {
dict lappend result $key {*}$connection
}
}
dict set result $key [lsort -unique [dict get $result $key]]
}
set result
}
这是找出哪些EID彼此连接的命令。它非常简单直接:它基本上只是输入字典的反转。我最后删除了最明显的副本。
proc findconnections data {
# This command discovers direct connections between keys in the
# dictionary which is passed to it. A direct connection exists
# between two keys if they share any members of their value lists.
# E.g.
# a {b c} and d {e c} are directly connected, but
# a {b c} and f {g h} are not.
#
# The result is a list of lists, where each sublist either contains
# * two or more keys: these keys are connected to each other by a
# single value list member, or
# * a single key: these keys have no connections at all.
set result [dict create]
dict for {key value} $data {
foreach val $value {
dict lappend result $val $key
}
}
# Return only the values from the result dictionary, and only
# trivially unique values at that.
lsort -unique [dict values $result]
}
这是将EID /节点/节点数据表简单地转换为字典的命令。这只是一个方便的命令,让我以更可行的格式定义输入。
proc makedatadictionary table {
# Convert a N x 3 table to a dictionary of N items where
# the key is the value in column 1 and its value is the
# list of the values in column 2 and 3.
set data [dict create]
foreach {col1 col2 col3} $table {
dict set data $col1 [list $col2 $col3]
}
set data
}
这就是你如何开始的。该参数由表示第一列中的EID令牌和第二列和第三列中的节点号的数据组成。实际值不会影响此代码的工作方式,但这些值都不应该是列表。
(在这个例子中,EID 50-40来自OP并且可能是真实数据,其余的由我组成以测试解决方案。)
main {
50 23 25
34 6 11
78 25 9
45 2 45
39 12 9
40 6 2
99 1 3
98 4 5
97 4 7
}
(注意:评论中提到的'Hoodiecrow'是我,我之前使用过那个昵称。)
答案 1 :(得分:0)
我不确定我是否理解你的问题,但这可能有所帮助。我基本上使用“节点”(你的帖子中提到的SQLite问题)作为数组键,并将所有EID附加到由“node”命名的数组元素 - > [list EID1 {node1 node2} EID2 {node3 node4}]等。
set l [ list 50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }]
puts $l
foreach {item nodes} $l {
foreach node $nodes {
lappend n($node) $item
}
}
foreach {group items} [array get n] {
puts "Group: $group Items: $items"
}
50 { 23 25 } 34 { 6 11 } 78 { 25 9 } 45 { 2 45 } 39 { 12 9 } 40 { 6 2 }
Group: 45 Items: 45
Group: 9 Items: 78 39
Group: 23 Items: 50
Group: 2 Items: 45 40
Group: 11 Items: 34
Group: 6 Items: 34 40
Group: 12 Items: 39
Group: 25 Items: 50 78