将列表中的元素移至所需的列

时间:2019-06-17 07:12:13

标签: python python-3.x pandas lda

我从LDA模型所做的工作中创建了一个列表: lda_model.get_document_topics(bag_of_words)

该模型由 7个主题组成,并通过列表理解为我提供了此结果:

[v for  v in lda_model_bigram.get_document_topics(bow_corpus_bigram)]

DataFrame列表

df = pd.DataFrame([[(0, 0.23410834), (1, 0.010244273), (2, 0.010266962), (3, 0.31661528), (4, 0.010282155), (5, 0.010329775), (6, 0.4081532)],
 [(0, 0.24538451), (3, 0.1353473), (6, 0.58342004)],
 [(0, 0.21097288), (1, 0.2306254), (3, 0.5263941)],
 [(0, 0.020534758), (1, 0.02050926), (2, 0.020555891), (3, 0.020502212), (4, 0.57683885), (5, 0.020568976), (6, 0.3204901)],
 [(2, 0.37945262), (4, 0.12737828), (6, 0.47517183)],
])

它看起来像这样: Below

我的问题是如何根据元组的第一个元素对齐值,使其看起来像以下内容:

After

1 个答案:

答案 0 :(得分:2)

对元组的第一个值中的键使用列表理解和嵌套字典理解-因此在DataFrame构造函数值正确对齐后:

L = [[(0, 0.23410834), (1, 0.010244273), (2, 0.010266962), (3, 0.31661528), (4, 0.010282155), (5, 0.010329775), (6, 0.4081532)],
 [(0, 0.24538451), (3, 0.1353473), (6, 0.58342004)],
 [(0, 0.21097288), (1, 0.2306254), (3, 0.5263941)],
 [(0, 0.020534758), (1, 0.02050926), (2, 0.020555891), (3, 0.020502212), (4, 0.57683885), (5, 0.020568976), (6, 0.3204901)],
 [(2, 0.37945262), (4, 0.12737828), (6, 0.47517183)]]

b = [{a: (a, b) for a, b in x} for x in L]

df = pd.DataFrame(b).fillna(0)
print (df)
                  0                 1                 2                 3  \
0   (0, 0.23410834)  (1, 0.010244273)  (2, 0.010266962)   (3, 0.31661528)   
1   (0, 0.24538451)                 0                 0    (3, 0.1353473)   
2   (0, 0.21097288)    (1, 0.2306254)                 0    (3, 0.5263941)   
3  (0, 0.020534758)   (1, 0.02050926)  (2, 0.020555891)  (3, 0.020502212)   
4                 0                 0   (2, 0.37945262)                 0   

                  4                 5                6  
0  (4, 0.010282155)  (5, 0.010329775)   (6, 0.4081532)  
1                 0                 0  (6, 0.58342004)  
2                 0                 0                0  
3   (4, 0.57683885)  (5, 0.020568976)   (6, 0.3204901)  
4   (4, 0.12737828)                 0  (6, 0.47517183)  

也可能是字典的返回列表,因此最后一个DataFrame由标量填充(如有必要):

b = [{a: b for a, b in x} for x in L]
df = pd.DataFrame(b).fillna(0)
print (df)
          0         1         2         3         4         5         6
0  0.234108  0.010244  0.010267  0.316615  0.010282  0.010330  0.408153
1  0.245385  0.000000  0.000000  0.135347  0.000000  0.000000  0.583420
2  0.210973  0.230625  0.000000  0.526394  0.000000  0.000000  0.000000
3  0.020535  0.020509  0.020556  0.020502  0.576839  0.020569  0.320490
4  0.000000  0.000000  0.379453  0.000000  0.127378  0.000000  0.475172