Question

我正在比较一组单词的字符串相似性。我想出了一个得分较高的对（相似的对）的列表。我首先需要合并具有连通性的那些。例如，In file included from /usr/include/c++/8/unordered_map:43, from test.cpp:1: /usr/include/c++/8/bits/stl_pair.h: In instantiation of ‘struct std::pair<const int, Node>’: /usr/include/c++/8/ext/aligned_buffer.h:91:28: required from ‘struct __gnu_cxx::__aligned_buffer<std::pair<const int, Node> >’ /usr/include/c++/8/bits/hashtable_policy.h:234:43: required from ‘struct std::__detail::_Hash_node_value_base<std::pair<const int, Node> >’ /usr/include/c++/8/bits/hashtable_policy.h:280:12: required from ‘struct std::__detail::_Hash_node<std::pair<const int, Node>, false>’ /usr/include/c++/8/bits/hashtable_policy.h:2027:49: required from ‘struct std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<const int, Node>, false> > >’ /usr/include/c++/8/bits/hashtable.h:173:11: required from ‘class std::_Hashtable<int, std::pair<const int, Node>, std::allocator<std::pair<const int, Node> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >’ /usr/include/c++/8/bits/unordered_map.h:105:18: required from ‘class std::unordered_map<int, Node>’ test.cpp:5:32: required from here /usr/include/c++/8/bits/stl_pair.h:215:11: error: ‘std::pair<_T1, _T2>::second’ has incomplete type _T2 second; /// @c second is a copy of the second object ^~~~~~ test.cpp:3:8: note: forward declaration of ‘struct Node’ struct Node {合并为[[(1,2),(2,4),(7,8)]。然后，我想找到每个小组的最佳代表。因此，我正在考虑寻找一位在每个组/集群的中心都是代表的代表。

我可以使用networkX吗？如果每个边缘都有一个分数来衡量节点的相似性，那么如何找到图的中心？如何将分数添加到边缘？有示例代码吗？

Answer 1

根据我的理解，这就是我要解决的问题，这就是我要解决的方法。看来您首先要从边缘列表中找到连接的组件。 networkX中有一个特定的功能。

让我们考虑以下示例：

l = [(1,2),(1,2),(1,4),(2,4),(2,5),(2,6),(7,8),(9,7),(1,2)]

让我们从上面的列表中构建一个网络。为了通用起见，我考虑了权重。权重将是给定边缘出现的次数：

import networkx as nx
from collections import Counter
from operator import itemgetter

G = nx.Graph()
weighted_edges = [(*i,j) for i,j in Counter(l).items()]
# [(1, 2, 3), (1, 4, 1), (2, 4, 1), (2, 5, 1), (2, 6, 1), (7, 8, 1), (9, 7, 1)]
G.add_weighted_edges_from(weighted_edges)

现在我们可以使用nx.connected_components获取连接的组件：

cc = nx.connected_components(G)
print(list(cc))
# [{1, 2, 4, 5, 6}, {7, 8, 9}]

鉴于我们想要获得一个节点/边缘在给定组件中的代表性的度量，一个选择可能是查看节点的degree。其中：

节点度是与节点相邻的边数

因此，我们可以做的是遍历连接的组件，并寻找中心度最高的节点。这是一种方法：

degree_cen = G.degree()
out = []
while True:
    try:
        component = next(cc)
        component_cen = {k: degree_cen[k] for k in component}
        center_node = max(component_cen.items(), key=itemgetter(1))[0]
        out.append({'component':component, 'center_node':center_node})
    except StopIteration:
        break

哪种产量：

print(out)
# [{'component': {1, 2, 4, 5, 6}, 'center_node': 2}, 
#  {'component': {7, 8, 9}, 'center_node': 7}]

如何在python networkX中添加具有字符串相似性分数的边并找到图的中心

1 个答案: