创建bokeh vbar,但发生错误“(BAD_COLUMN_NAME):字形引用不存在的列名称:value”

时间:2018-07-18 05:03:27

标签: python bokeh pandas-groupby

美好的一天!

我正在尝试像#cylinders和制造商(https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#userguide-categorical)上的bokeh:mean MPG上的bokeh文档中创建vbar图表,但始终出现错误(BAD_COLUMN_NAME):字形是指不存在的列名:value“

这是我的csv文件

 id,name,value,vendorname
 1, contract abc, "2,000,500.00", company x
 2, contract bcd, "1,300,500.00", company y
 3, contract cde, "1,344,000.00", company x
 4, contract def, "400,000.00", company z
 5, contract efg, "566,000.00", company s

以此类推。...

代码如下:

 from bokeh.io import show, output_file
 from bokeh.palettes import Viridis256
 from bokeh.transform import factor_cmap
 import pandas as pd
 from bokeh.core.properties import value
 from bokeh.models import FactorRange, ColumnDataSource
 from bokeh.palettes import Spectral5
 from bokeh.plotting import figure, show
 from bokeh.embed import components  

 df = pd.read_csv('contract.csv')
 group = df.groupby(['vendorname', 'name'])
 index_cmap = factor_cmap('vendorname_name', palette=Viridis256, factors=sorted(df.vendorname.unique()), end=1)
 p = figure(plot_width=1000, plot_height=1500, title="Value Contract by # Contract and Vendor", x_range=group, toolbar_location=None, 
       tooltips=[("Value", "@value"), ("vendorname, name", "@vendorname_name")])
 p.vbar(x='vendorname_name', top='value', width=1, source=group, line_color="white", fill_color=index_cmap, )

 p.y_range.start = 0
 p.x_range.range_padding = 0.05
 p.xgrid.grid_line_color = None
 p.xaxis.axis_label = "Contract grouped by # Vendor"
 p.xaxis.major_label_orientation = 1.2
 p.outline_line_color = None
 output_file("contract.html")
 show(p)

感谢任何人的帮助。谢谢

1 个答案:

答案 0 :(得分:0)

在数据框上执行groupby时,只有该组的聚合:

In [3]: source = ColumnDataSource(group)

In [4]: source.data
Out[4]:
{'id_count': array([1, 1, 1], dtype=object),
 'id_unique': array([1, 1, 1], dtype=object),
 'id_top': array([' "2', ' "1', ' "1'], dtype=object),
 'id_freq': array([1, 1, 1], dtype=object),
 'value_count': array([1, 1, 1], dtype=object),
 'value_unique': array([1, 1, 1], dtype=object),
 'value_top': array(['500.00"', '000.00"', '500.00"'], dtype=object),
 'value_freq': array([1, 1, 1], dtype=object),
 'vendorname_name': array([(' company x', '000'), (' company x', '344'),
        (' company y', '300')], dtype=object)}

但是请注意,这些值是垃圾,因为您的read_csv也不起作用。在您的数据中,value列是格式不正确的数字字符串的数组。数字中的逗号将熊猫与您上面的基本代码混淆。您需要遵循https://stackoverflow.com/a/22137890/3406693中的建议才能在该列中以实际数字的形式阅读。但是,您的CSV格式不正确,逗号之间有多余的空格,这也使大熊猫感到困惑。在固定CSV删除空格之前,我无法让熊猫正确读取数据帧:

id,name,value,vendorname
1,contract abc,"2,000,500.00",company x
2,contract bcd,"1,300,500.00",company y
3,contract cde,"1,344,000.00",company x
4,contract def,"400,000.00",company z
5,contract efg,"566,000.00",company s

然后,此read_csv命令

df = pd.read_csv('contract.csv', thousands=",", quotechar='"', quoting=1)

产生一个可理解的数据框:

In [3]: df
Out[3]:
   id          name      value vendorname
0   1  contract abc  2000500.0  company x
1   2  contract bcd  1300500.0  company y
2   3  contract cde  1344000.0  company x
3   4  contract def   400000.0  company z
4   5  contract efg   566000.0  company s

您有可能可以通过明确告诉Pandas列类型是什么来解决CSV问题。

无论如何,您都可以使用value_mean从组中进行绘图:

 p.vbar(x='vendorname_name', top='value_meamn', ...) # use value_mean

哪种产量:

enter image description here