我有100个节点和4950个边缘。在Python中创建图形的最快方法是什么(根本没有计划可视化或绘制它),这样我就可以访问节点信息,这样我就可以通过节点1连接来获得2d矩阵中每个项目的含义到节点3? (我也不需要将其保存为矩阵)。
import gensim
import nltk
from gensim.models import word2vec
from nltk.corpus import stopwords
import logging
import re
import itertools
import glob
from collections import defaultdict
import networkx as nx
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
level=logging.INFO)
sentences = word2vec.Text8Corpus("/home/mona/mscoco/text8")
model = word2vec.Word2Vec(sentences, workers = 16)
#model.init_sims(replace = True)
model_name = "text8_data"
model.save(model_name)
stopwords = nltk.corpus.stopwords.words('english')
path = "/home/mona/mscoco/caption_files/*.txt"
files = glob.glob(path)
adj_list = defaultdict(lambda: defaultdict(lambda: 0))
for file in files:
g.add_nodes(file)
for file1, file2 in itertools.combinations(files, 2):
with open(file1) as f1:
f1_text = f1.read()
f1_words = re.sub("[^a-zA-Z]", ' ', f1_text).lower().split()
f1_words = [w for w in f1_words if w not in stopwords]
print(f1_text)
f1.close()
with open(file2) as f2:
f2_text = f2.read()
f2_words = re.sub("[^a-zA-Z]", ' ', f2_text).lower().split()
f2_words = [w for w in f2_words if w not in stopwords]
print(f2_text)
f2.close()
print('{0}: {1}: {2}'.format(file1, file2, model.wmdistance(f1_words, f2_words)))
g.add_edge(file1, file2, model.wmdistance(f1_words, f2_words))
print(g.number_of_edges())
print(g.number_of_edges())
nx.write_gml(g, "gensim.gml")
如果您对我目前的代码有更好的建议,请告诉我。我最终会有20个节点和190个边缘。我主要是在寻找能够处理其输出的东西,比如MATLAB这样的其他程序。我不确定.gml文件是否易于在MATLAB中处理。
答案 0 :(得分:1)
我认为为了在Matlab中重用的精确目的而生成一个GML文件可能很好。这个问题有更多相关信息。