Question

我尝试按照本教程（https://shuaiw.github.io/2016/12/22/topic-modeling-and-tsne-visualzation.html）使用t-sne和bokeh可视化LDA。但我遇到了一些问题。当我试图运行以下代码时：

    plot_lda.scatter(x=tsne_lda[:, 0], y=tsne_lda[:, 1],
             color=colormap[_lda_keys][:num_example],
             source=bp.ColumnDataSource({
               "content": text[:num_example],
               "topic_key": _lda_keys[:num_example]
               }))

注意：在教程中，内容称为新闻，在我的内容中称为文本

我收到此错误：

向字形方法提供用户定义的数据源和可迭代值不可能。之一：

直接将所有数据作为文字传递：

p.circe(x=a_list, y=an_array, ...)

或者，将所有数据放在ColumnDataSource中并传递列名：

source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)

对我而言，这并没有太多意义，而且我没有成功地在这里找到任何对它的回答，github或者其他地方。希望有些人可以提供帮助。最好的尼尔斯

Answer 1

我一直在与这段代码作斗争，我发现它有两个问题。

首先，当您将源传递给分散函数（如错误状态）时，您必须在字典中包含所有数据，即x和y轴，颜色，标签以及要包含在其中的任何其他信息工具提示。

其次，x和y轴的形状与传递给工具提示的信息不同，因此您还必须使用num_example变量对轴中的两个数组进行切片。

以下代码让我运行：

# create the dictionary with all the information    
plot_dict = {
        'x': tsne_lda[:num_example, 0],
        'y': tsne_lda[:num_example, 1],
        'colors': colormap[_lda_keys][:num_example],
        'content': text[:num_example],
        'topic_key': _lda_keys[:num_example]
        }

# create the dataframe from the dictionary
plot_df = pd.DataFrame.from_dict(plot_dict)

# declare the source    
source = bp.ColumnDataSource(data=plot_df)
title = 'LDA viz'

# initialize bokeh plot
plot_lda = bp.figure(plot_width=1400, plot_height=1100,
                     title=title,
                     tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave",
                     x_axis_type=None, y_axis_type=None, min_border=1)

# build scatter function from the columns of the dataframe
plot_lda.scatter('x', 'y', color='colors', source=source)

带有Bokeh和T-sne的Viz LDA模型

1 个答案: