我从LDA模型所做的工作中创建了一个列表:
lda_model.get_document_topics(bag_of_words)
该模型由 7个主题组成,并通过列表理解为我提供了此结果:
[v for v in lda_model_bigram.get_document_topics(bow_corpus_bigram)]
DataFrame列表
df = pd.DataFrame([[(0, 0.23410834), (1, 0.010244273), (2, 0.010266962), (3, 0.31661528), (4, 0.010282155), (5, 0.010329775), (6, 0.4081532)],
[(0, 0.24538451), (3, 0.1353473), (6, 0.58342004)],
[(0, 0.21097288), (1, 0.2306254), (3, 0.5263941)],
[(0, 0.020534758), (1, 0.02050926), (2, 0.020555891), (3, 0.020502212), (4, 0.57683885), (5, 0.020568976), (6, 0.3204901)],
[(2, 0.37945262), (4, 0.12737828), (6, 0.47517183)],
])
我的问题是如何根据元组的第一个元素对齐值,使其看起来像以下内容:
答案 0 :(得分:2)
对元组的第一个值中的键使用列表理解和嵌套字典理解-因此在DataFrame构造函数值正确对齐后:
L = [[(0, 0.23410834), (1, 0.010244273), (2, 0.010266962), (3, 0.31661528), (4, 0.010282155), (5, 0.010329775), (6, 0.4081532)],
[(0, 0.24538451), (3, 0.1353473), (6, 0.58342004)],
[(0, 0.21097288), (1, 0.2306254), (3, 0.5263941)],
[(0, 0.020534758), (1, 0.02050926), (2, 0.020555891), (3, 0.020502212), (4, 0.57683885), (5, 0.020568976), (6, 0.3204901)],
[(2, 0.37945262), (4, 0.12737828), (6, 0.47517183)]]
b = [{a: (a, b) for a, b in x} for x in L]
df = pd.DataFrame(b).fillna(0)
print (df)
0 1 2 3 \
0 (0, 0.23410834) (1, 0.010244273) (2, 0.010266962) (3, 0.31661528)
1 (0, 0.24538451) 0 0 (3, 0.1353473)
2 (0, 0.21097288) (1, 0.2306254) 0 (3, 0.5263941)
3 (0, 0.020534758) (1, 0.02050926) (2, 0.020555891) (3, 0.020502212)
4 0 0 (2, 0.37945262) 0
4 5 6
0 (4, 0.010282155) (5, 0.010329775) (6, 0.4081532)
1 0 0 (6, 0.58342004)
2 0 0 0
3 (4, 0.57683885) (5, 0.020568976) (6, 0.3204901)
4 (4, 0.12737828) 0 (6, 0.47517183)
也可能是字典的返回列表,因此最后一个DataFrame由标量填充(如有必要):
b = [{a: b for a, b in x} for x in L]
df = pd.DataFrame(b).fillna(0)
print (df)
0 1 2 3 4 5 6
0 0.234108 0.010244 0.010267 0.316615 0.010282 0.010330 0.408153
1 0.245385 0.000000 0.000000 0.135347 0.000000 0.000000 0.583420
2 0.210973 0.230625 0.000000 0.526394 0.000000 0.000000 0.000000
3 0.020535 0.020509 0.020556 0.020502 0.576839 0.020569 0.320490
4 0.000000 0.000000 0.379453 0.000000 0.127378 0.000000 0.475172