Question

感谢您的光临！我希望获得一些使用pandas dataframe创建csv的帮助。这是我的代码：

a = ldamallet[bow_corpus_new[:21]]
b = data_text_new

print(a)
print("/n")
print(b)

d = {'Preprocessed Document': b['Preprocessed Document'].tolist(), 
     'topic_0': a[0][1], 
     'topic_1': a[1][1], 
     'topic_2': a[2][1], 
     'topic_3': a[3][1], 
     'topic_4': a[4][1], 
     'topic_5': a[5][1], 
     'topic_6': a[6][1], 
     'topic_7': a[7][1], 
     'topic_8': a[8][1], 
     'topic_9': a[9][1], 
     'topic_10': a[10][1],
     'topic_11': a[11][1], 
     'topic_12': a[12][1],
     'topic_13': a[13][1],
     'topic_14': a[14][1],
     'topic_15': a[15][1],
     'topic_16': a[16][1],
     'topic_17': a[17][1],
     'topic_18': a[18][1],
     'topic_19': a[19][1]}

print(d)

df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False)

数据：

print（a）：格式为元组

[[（（主题数：0，主题百分比），...（19，＃）]，[（下一行的主题分布，＃）...（19，.819438），...（＃，＃），...]

打印（b）

这是我的错误：

这是数据框的大小：

这就是我希望的样子：

任何帮助将不胜感激：）

Answer 1

对于它自己的列表中的所有行，获取每个元组的第二个值可能是最容易的。像这样

topic_0=[]
topic_1=[]
topic_2=[]
...and so on
for i in a:
    topic_0.append(i[0][1])
    topic_1.append(i[1][1])
    topic_2.append(i[2][1])
    ...and so on

然后您可以像这样制作字典

d = {'Preprocessed Document': b['Preprocessed Document'].tolist(), 
     'topic_0': topic_0, 
     'topic_1': topic_1, 
      etc. }

Answer 2

我接受了@mattcremeens的建议，它奏效了。我已经在下面发布了完整的代码。他正确地解决了元组的问题，而我以前的代码并没有遍历行，而只是打印了第一行。

NULL

ValueError：数组的长度必须相同-将数据帧打印为CSV

2 个答案: