我有一个小的语法,表示为变体类型term
,其字符串是令牌/令牌的一部分( type term )。
给出语法中的表达式,我从表达式中收集所有字符串并将它们打包成集( function vars
)。最后,我想用这些集合创建一些图形作为顶点(第48-49行)。
出于某种原因,以这种复杂方式创建的图形不识别包含相同变量的集合,并创建具有相同内容的多个顶点。我真的不明白为什么会这样。
以下是此行为的最小工作示例:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
要进行编译,请键入ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo
。执行程序时,您将获得此输出:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
总而言之,我很好奇为什么集合中存在不相等的哈希值,并且在图表中创建了两个相同的顶点,尽管这些集合在所有其他方法中都是相同的。
答案 0 :(得分:4)
我怀疑一般的答案是OCaml的内置散列是基于值的物理属性,而集合相等是一个更抽象的概念。如果将集合表示为有序二进制树,则有许多树代表相同的集合(众所周知)。这些将与集合相同,但可能很好地散列到不同的值。
如果您希望散列适用于集合,则可能需要提供自己的函数。
答案 1 :(得分:1)
正如杰弗里指出的那样,问题似乎是哈希函数的定义,它是Node
模块的一部分。
将其更改为let hash x = Hashtbl.hash (SSet.elements x)
解决了问题。