在Python中使元组列表的功能极为高效

时间:2016-10-27 23:52:32

标签: python list dictionary optimization tuples

我已经创建了一个名为get_document_topics_for_corpus的辅助函数来获取元组列表。但是,效率不高。它的调用方式如下:

topics = lda.get_document_topics(corpus, per_word_topics=True)
doc_topics, word_topics, word_phis = get_document_topics_for_corpus(topics)

print "Document topics: ", doc_topics
print "Word topics: ", word_topics
print "Word phis:", word_phis

并且,返回的结果是:

Document topics:  [(96, 0.75250000000000006), (34, 0.80200000000000227), (70, 0.80200000000000093), (60, 0.75250000000000161), (80, 0.85857142857136792), (58, 0.7525000000000015), (91, 0.75250000000000017), (28, 0.50499999999999268), (62, 0.66999998118978443)]

Word topics:  [(0, [96, 70]), (1, [96, 80]), (2, [96, 34]), (3, [80, 58]), (4, [80, 58]), (5, [80, 91]), (6, [80, 70, 34]), (7, [80, 70, 58]), (8, [70, 34]), (9, [28, 62, 60]), (10, [62, 60, 91]), (11, [60, 91])]

Word phis: [(0, [(96, 0.99999999999999989), (70, 0.99999999999999989)]), (1, [(96, 0.99999999999999989), (80, 1.0)]), (2, [(96, 0.99999999999999989), (34, 1.0)]), (3, [(80, 1.0), (58, 1.0)]), (4, [(80, 1.0), (58, 1.0000000000000002)]), (5, [(80, 1.0), (91, 1.0)]), (6, [(80, 1.0), (70, 1.0), (34, 2.0)]), (7, [(80, 1.0), (70, 1.0), (58, 1.0000000000000002)]), (8, [(70, 1.0), (34, 1.0)]), (9, [(28, 1.0), (62, 1.0), (60, 1.0)]), (10, [(62, 1.0), (60, 1.0), (91, 0.99999999999999989)]), (11, [(60, 1.0), (91, 1.0)])]

我已经编写了帮助函数来执行此任务,如下所示:

def get_document_topics_for_corpus(topics):
    document_topics = dict()
    word_topics = dict()
    word_phis = dict()

    doc_topics = list()
    word_top = list()
    word_ph = list()

    for doc_topic, word_topic, word_phi in topics:

        #Document_topics aggregation
        key_doc = doc_topic[0][0]
        value_doc = doc_topic[0][1]
        document_topics.setdefault(key_doc, value_doc)

        #Word_topics aggregation
        for key in  word_topic:
            word_topics.setdefault(key[0], [])
            word_topics[key[0]].append(key_doc)

        #Word_phis aggregation
        for key in word_phi:
            word_phis.setdefault(key[0], [])
            word_phis[key[0]].append(key[1][0])

    for key, value in document_topics.iteritems():
        temp = (key, value)
        doc_topics.append(temp)

    for key, value in word_topics.iteritems():
        temp = (key, value)
        word_top.append(temp)

    for key, value in word_phis.iteritems():
        temp = (key, value)
        word_ph.append(temp)


    return (doc_topics, word_top, word_ph)

我正在从主题列表中聚合此结果,其中每个主题都是由文档主题,单词主题和word_phis组成的元组。为了理解这一点,主题如下所示,其中每个主题由' -------'

分隔
new doc
Document topics: [(79, 0.75250000000000072)]
Word topics: [(0, [79]), (1, [79]), (2, [79])]
Word phis: [(0, [(79, 1.0)]), (1, [(79, 1.0)]), (2, [(79, 1.0)])]
--------------
new doc
Document topics: [(23, 0.85857142857143054)]
Word topics: [(1, [23]), (3, [23]), (4, [23]), (5, [23]), (6, [23]), (7, [23])]
Word phis: [(1, [(23, 1.0)]), (3, [(23, 1.0)]), (4, [(23, 1.0)]), (5, [(23, 1.0)]), (6, [(23, 1.0)]), (7, [(23, 1.0)])]
--------------
new doc
Document topics: [(28, 0.80199999993851401)]
Word topics: [(0, [28]), (6, [28]), (7, [28]), (8, [28])]
Word phis: [(0, [(28, 1.0)]), (6, [(28, 1.0)]), (7, [(28, 1.0000000000000002)]), (8, [(28, 1.0)])]
--------------

任何人都可以帮助转换此功能,以便更加优化并尽可能快(并生成相同的输出)??这将非常有帮助。感谢。

0 个答案:

没有答案