是否有更直接或有效的方法将gensim.interfaces.TransformedCorpus对象的主题概率数据转换为numpy数组(或者pandas数据帧),而不是下面的by-row方法?
-i : opens a temp file and automatically replaces the file to be
edited with the temporary file after processing (the '.tmp'
is the suffix to use for the temp file during processing)
-w : command line flag to 'use warnings'
-p : magic; basically equivalent to coding:
LINE: while (defined $_ = <ARGV>)) {
"your code here"
}
-e : perl code follows this flag (enclosed in double quotes for MSWin32 aficiandos)
答案 0 :(得分:3)
可能为时已晚,但是gensim具有用于与numpy / scipy数组进行相互转换的辅助函数。
您要寻找的是:
然后,您可以根据需要将输出转换为numpy数组或pandas df。
import gensim
import numpy as np
all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()