在散景

时间:2018-05-04 09:25:06

标签: python-3.x plot callback bokeh multiline

我有一个用例,我有多个线图(带图例),我需要根据列条件更新线图。下面是两个数据集的示例,基于国家/地区,列数据源更改。但我面临的问题是,数据源的列数并不固定,甚至类型也可能不同。因此,当我在选择新国家时基于回调更新数据源时,我收到此错误:

Error: attempted to retrieve property array for nonexistent field 'pay_conv_7d.content'. 

我猜是因为在新的数据源中,pay_conv_7d.content列不存在,但在我的情节中,这些行已经存在。我一直试图通过各种方法解决这个问题(为所有国家选择制作公共列 - 在回调中添加数据源中缺少的列,但仍然会遇到问题。

有没有什么干净的方法可以使用回调更新多个线图,而不是做很多hackish方式?任何见解或帮助将非常感激。非常感谢! :)

def setup_multiline_plots(x_axis, y_axis, title_text, data_source, plot):
    num_categories = len(data_source.data['categories'])
    legends_list = list(data_source.data['categories'])
    colors_list = Spectral11[0:num_categories]
    # xs = [data_source.data['%s.'%x_axis].values] * num_categories
    # ys = [data_source.data[('%s.%s')%(y_axis,column)] for column in data_source.data['categories']]
    # data_source.data['x_series'] = xs
    # data_source.data['y_series'] = ys
    # plot.multi_line('x_series', 'y_series', line_color=colors_list,legend='categories', line_width=3, source=data_source)
    plot_list = []
    for (colr, leg, column) in zip(colors_list, legends_list, data_source.data['categories']):
        xs, ys = '%s.'%x_axis, ('%s.%s')%(y_axis,column)
        plot.line(xs,ys, source=data_source, color=colr, legend=leg, line_width=3, name=ys)
        plot_list.append(ys)
    data_source.data['plot_names'] = data_source.data.get('plot_names',[]) + plot_list
    plot.title.text = title_text

def update_plot(country, timeseries_df, timeseries_source,
                aggregate_df, aggregate_source, category,
                plot_pay_7d, plot_r_pay_90d):

    aggregate_metrics = aggregate_df.loc[aggregate_df.country == country]
    aggregate_metrics = aggregate_metrics.nlargest(10, 'cost')
    category_types = list(aggregate_metrics[category].unique())
    timeseries_df = timeseries_df[timeseries_df[category].isin(category_types)]
    timeseries_multi_line_metrics = get_multiline_column_datasource(timeseries_df, category, country)

    # len_series = len(timeseries_multi_line_metrics.data['time.'])
    # previous_legends = timeseries_source.data['plot_names']
    # current_legends = timeseries_multi_line_metrics.data.keys()
    # common_legends = list(set(previous_legends) & set(current_legends))
    # additional_legends_list = list(set(previous_legends) - set(current_legends))
    # for legend in additional_legends_list:
    #     zeros = pd.Series(np.array([0] * len_series), name=legend)
    #     timeseries_multi_line_metrics.add(zeros, legend)
    # timeseries_multi_line_metrics.data['plot_names'] = previous_legends

    timeseries_source.data = timeseries_multi_line_metrics.data
    aggregate_source.data = aggregate_source.from_df(aggregate_metrics)

def get_multiline_column_datasource(df, category, country):

    df_country = df[df.country == country]
    df_pivoted = pd.DataFrame(df_country.pivot_table(index='time', columns=category, aggfunc=np.sum).reset_index())
    df_pivoted.columns = df_pivoted.columns.to_series().str.join('.')
    categories = list(set([column.split('.')[1] for column in list(df_pivoted.columns)]))[1:]
    data_source = ColumnDataSource(df_pivoted)
    data_source.data['categories'] = categories

1 个答案:

答案 0 :(得分:0)

最近我不得不更新Multiline字形的数据。如果您想查看我的算法,请检查my question

我认为您至少可以通过三种方式更新ColumnDataSource

  1. 您可以创建数据帧来实例化新CDS

    cds = ColumnDataSource(df_pivoted)
    data_source.data = cds.data
    
  2. 您可以创建字典并直接将其分配给数据属性

    d = {
        'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
        'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
        'xs1': [[17.0, 166.0], [17.0, 116.0], [17.0, 126.0]],
        'ys1': [[179.0, 169.0], [179.0, 1169.0], [1729.0, 169.0]],
        'xs2': [[27.0, 276.0], [27.0, 216.0], [27.0, 226.0]],
        'ys2': [[279.0, 269.0], [279.0, 2619.0], [2579.0, 2569.0]]
    }
    data_source.data = d
    

    如果您需要不同大小的列或空列,则可以使用NaN值填充空白,以保持列大小。我认为这是你问题的解决方案:

    import numpy as np
    d = {
        'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
        'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
        'xs1': [[17.0, 166.0], [np.nan], [np.nan]],
        'ys1': [[179.0, 169.0], [np.nan], [np.nan]],
        'xs2': [[np.nan], [np.nan], [np.nan]],
        'ys2': [[np.nan], [np.nan], [np.nan]]
    }
    data_source.data = d
    
  3. 或者,如果您只需要修改一些值,则可以使用方法patch。查看文档here

      

    以下示例说明如何修补整个列元素。在这种情况下,

        source = ColumnDataSource(data=dict(foo=[10, 20, 30], bar=[100, 200, 300]))
        patches = {
            'foo' : [ (slice(2), [11, 12]) ],
            'bar' : [ (0, 101), (2, 301) ],
        }
        source.patch(patches)
    
      

    执行此操作后,source.data的值将为:

        dict(foo=[11, 22, 30], bar=[101, 200, 301])
    
  4. 注意:一次性更新以避免性能问题非常重要