将gensim TransformedCorpus数据高效转换为数组

时间:2018-01-20 16:03:39

标签: python numpy gensim lda

是否有更直接或有效的方法将gensim.interfaces.TransformedCorpus对象的主题概率数据转换为numpy数组(或者pandas数据帧),而不是下面的by-row方法?

-i : opens a temp file and automatically replaces the file to be 
     edited with the temporary file after processing (the '.tmp'
     is the suffix to use for the temp file during processing)
-w : command line flag to 'use warnings'
-p : magic; basically equivalent to coding:
     LINE: while (defined $_ = <ARGV>)) {
         "your code here"
     }
-e : perl code follows this flag (enclosed in double quotes for MSWin32 aficiandos)

1 个答案:

答案 0 :(得分:3)

可能为时已晚,但是gensim具有用于与numpy / scipy数组进行相互转换的辅助函数。

您要寻找的是:

gensim.matutils.corpus2csc

然后,您可以根据需要将输出转换为numpy数组或pandas df。

import gensim
import numpy as np

all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()