我试图按照这里的示例:https://anaconda.org/jbednar/nyc_taxi/notebook
但是,我无法使用以下块,因为MemoryError总是抛出特定的行(注释掉):
def merged_images(x_range, y_range, w=plot_width, h=plot_height, how='log'):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
picks = cvs.points(df, 'pickup_x', 'pickup_y', ds.count('passenger_count'))
drops = cvs.points(df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'))
#more_drops = tf.shade(drops.where(drops > picks), cmap=["darkblue", 'cornflowerblue'], how=how)
#more_picks = tf.shade(picks.where(picks > drops), cmap=["darkred", 'orangered'], how=how)
img = tf.stack(more_picks,more_drops)
return tf.dynspread(img, threshold=0.3, max_px=4)
p = base_plot(background_fill_color=background)
export(merged_images(*NYC),"NYCT_pickups_vs_dropoffs")
InteractiveImage(p, merged_images)
这里需要大量的RAM(> 64GB),还是有一些我错过的与内存相关的配置?我尝试使用当前版本的Python 3.6和各自的库(散景,数据分析器,jupyter)在Windows 10和Linux 16.04(均为64位版本)上都无济于事。
更新:我也注意到即使我的df.tail()
似乎有统计(11842093条记录),直方图结果(从histogram(agg.values)
开始)与原始笔记本非常不同(截至{{ 3}})。
答案 0 :(得分:1)
根据@ JamesA.Bednar的评论:以及相关提交:https://github.com/bokeh/datashader/commit/9fbace5c7b00410bdac7b7662ee24e466bc66330,问题出现在xarray> = 0.8
修复是“在比较/合并/连接之前重命名列匹配”
结果:
def merged_images(x_range, y_range, w=plot_width, h=plot_height, how='log'):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
picks = cvs.points(df, 'pickup_x', 'pickup_y', ds.count('passenger_count'))
drops = cvs.points(df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'))
drops = drops.rename({'dropoff_x': 'x', 'dropoff_y': 'y'}) # added line
picks = picks.rename({'pickup_x': 'x', 'pickup_y': 'y'}) # added line
more_drops = tf.shade(drops.where(drops > picks), cmap=["darkblue", 'cornflowerblue'], how=how)
more_picks = tf.shade(picks.where(picks > drops), cmap=["darkred", 'orangered'], how=how)
img = tf.stack(more_picks,more_drops)
return tf.dynspread(img, threshold=0.3, max_px=4)