美好的一天!
我正在尝试像#cylinders和制造商(https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#userguide-categorical)上的bokeh:mean MPG上的bokeh文档中创建vbar图表,但始终出现错误(BAD_COLUMN_NAME):字形是指不存在的列名:value“
这是我的csv文件
id,name,value,vendorname
1, contract abc, "2,000,500.00", company x
2, contract bcd, "1,300,500.00", company y
3, contract cde, "1,344,000.00", company x
4, contract def, "400,000.00", company z
5, contract efg, "566,000.00", company s
以此类推。...
代码如下:
from bokeh.io import show, output_file
from bokeh.palettes import Viridis256
from bokeh.transform import factor_cmap
import pandas as pd
from bokeh.core.properties import value
from bokeh.models import FactorRange, ColumnDataSource
from bokeh.palettes import Spectral5
from bokeh.plotting import figure, show
from bokeh.embed import components
df = pd.read_csv('contract.csv')
group = df.groupby(['vendorname', 'name'])
index_cmap = factor_cmap('vendorname_name', palette=Viridis256, factors=sorted(df.vendorname.unique()), end=1)
p = figure(plot_width=1000, plot_height=1500, title="Value Contract by # Contract and Vendor", x_range=group, toolbar_location=None,
tooltips=[("Value", "@value"), ("vendorname, name", "@vendorname_name")])
p.vbar(x='vendorname_name', top='value', width=1, source=group, line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Contract grouped by # Vendor"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
output_file("contract.html")
show(p)
感谢任何人的帮助。谢谢
答案 0 :(得分:0)
在数据框上执行groupby
时,只有该组的聚合:
In [3]: source = ColumnDataSource(group)
In [4]: source.data
Out[4]:
{'id_count': array([1, 1, 1], dtype=object),
'id_unique': array([1, 1, 1], dtype=object),
'id_top': array([' "2', ' "1', ' "1'], dtype=object),
'id_freq': array([1, 1, 1], dtype=object),
'value_count': array([1, 1, 1], dtype=object),
'value_unique': array([1, 1, 1], dtype=object),
'value_top': array(['500.00"', '000.00"', '500.00"'], dtype=object),
'value_freq': array([1, 1, 1], dtype=object),
'vendorname_name': array([(' company x', '000'), (' company x', '344'),
(' company y', '300')], dtype=object)}
但是请注意,这些值是垃圾,因为您的read_csv
也不起作用。在您的数据中,value
列是格式不正确的数字字符串的数组。数字中的逗号将熊猫与您上面的基本代码混淆。您需要遵循https://stackoverflow.com/a/22137890/3406693中的建议才能在该列中以实际数字的形式阅读。但是,您的CSV格式不正确,逗号之间有多余的空格,这也使大熊猫感到困惑。在固定CSV删除空格之前,我无法让熊猫正确读取数据帧:
id,name,value,vendorname
1,contract abc,"2,000,500.00",company x
2,contract bcd,"1,300,500.00",company y
3,contract cde,"1,344,000.00",company x
4,contract def,"400,000.00",company z
5,contract efg,"566,000.00",company s
然后,此read_csv
命令
df = pd.read_csv('contract.csv', thousands=",", quotechar='"', quoting=1)
产生一个可理解的数据框:
In [3]: df
Out[3]:
id name value vendorname
0 1 contract abc 2000500.0 company x
1 2 contract bcd 1300500.0 company y
2 3 contract cde 1344000.0 company x
3 4 contract def 400000.0 company z
4 5 contract efg 566000.0 company s
您有可能可以通过明确告诉Pandas列类型是什么来解决CSV问题。
无论如何,您都可以使用value_mean
从组中进行绘图:
p.vbar(x='vendorname_name', top='value_meamn', ...) # use value_mean
哪种产量: