Question

我使用beautifulsoup删除了一些数据，并保存为.txt文件。这些数据来自IMDB.com的电影评论我发现了一个很好的单词计算python代码，所以我可以创建一个单词频率excel表。但是，我无法使用频率表绘制图形。

我想使用UCINET 绘制语义网络图（节点大小应该基于中介中心性。）

我的问题是如何将文本文件转换为邻接矩阵数据来绘制UCINET图。 像这样http://www.umasocialmedia.com/socialnetworks/wp-content/uploads/2012/09/senatorsxsenators1.png 我想使用评论者使用的单词绘制网络图。

（如果两个单词出现在同一个句子中，当它们匹配行和列行时，计算频率）

或。你能告诉我如何在Python代码中绘制网络图（使用中介中心性）

Answer 1

创建一个2D 20x20数组，遍历每个输入字符串，并使用该字符串更新矩阵：

adjacency_matrix = [[0 for _ in range(20)] for _ in range(20)]


def get_lines(filename):
    """Returns the lines in the file"""
    with open(filename, 'r') as fp:
        return fp.readlines()


def update_matrix(matrix, mapping, string):
    """Update the given adjacency matrix using the given string."""                                        
    words = [_ for _ in re.split("\s+", string) if _ in mapping.keys()]            
    for word_1 in words:                                                           
        for word_2 in words:                                                       
            matrix[mapping[word_1]][mapping[word_2]] += 1


if __name__ == "__main__":
    words_in_matrix = ["X-men", "awesome", "good", "bad", ... 16 more ...]
    mapping = {word: index for index, word in enumerate(words_in_matrix)}

    for line in get_lines("ibdb.txt"):
        update_matrix(adjacency_matrix, mapping, line)
    print(adjacency_matrix)

类似于update_matrix的函数可能很有用，matrix作为邻接矩阵，mapping将单词映射到邻接矩阵中的索引，string样本审查。

您需要根据需要进行修改。输入可能包含句点或其他噪音字符，需要删除。

用于UCINET网络图绘制的Python代码

1 个答案: