创建边缘太多的图形

时间:2016-10-10 22:18:02

标签: python graph networkx

我有100个节点和4950个边缘。在Python中创建图形的最快方法是什么(根本没有计划可视化或绘制它),这样我就可以访问节点信息,这样我就可以通过节点1连接来获得2d矩阵中每个项目的含义到节点3? (我也不需要将其保存为矩阵)。

import gensim
import nltk
from gensim.models import word2vec
from nltk.corpus import stopwords
import logging
import re
import itertools
import glob
from collections import defaultdict
import networkx as nx


logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
                    level=logging.INFO)

sentences = word2vec.Text8Corpus("/home/mona/mscoco/text8")
model = word2vec.Word2Vec(sentences, workers = 16)
#model.init_sims(replace = True)
model_name = "text8_data"
model.save(model_name)

stopwords = nltk.corpus.stopwords.words('english')

path = "/home/mona/mscoco/caption_files/*.txt"
files = glob.glob(path)
adj_list = defaultdict(lambda: defaultdict(lambda: 0))


for file in files:
        g.add_nodes(file)

for file1, file2 in itertools.combinations(files, 2):
        with open(file1) as f1:
                f1_text = f1.read()
                f1_words = re.sub("[^a-zA-Z]", ' ', f1_text).lower().split()
                f1_words = [w for w in f1_words if w not in stopwords]
                print(f1_text)
                f1.close()
        with open(file2) as f2:
                f2_text = f2.read()
                f2_words = re.sub("[^a-zA-Z]", ' ', f2_text).lower().split()
                f2_words = [w for w in f2_words if w not in stopwords]
                print(f2_text)
                f2.close()
        print('{0}: {1}: {2}'.format(file1, file2, model.wmdistance(f1_words, f2_words)))
        g.add_edge(file1, file2, model.wmdistance(f1_words, f2_words))



print(g.number_of_edges())
print(g.number_of_edges())


nx.write_gml(g, "gensim.gml")

如果您对我目前的代码有更好的建议,请告诉我。我最终会有20个节点和190个边缘。我主要是在寻找能够处理其输出的东西,比如MATLAB这样的其他程序。我不确定.gml文件是否易于在MATLAB中处理。

1 个答案:

答案 0 :(得分:1)

我认为为了在Matlab中重用的精确目的而生成一个GML文件可能很好。这个问题有更多相关信息。

Convert GML file to adjacency matrix in matlab