我有一个分组的pandas数据框,按3层分组:日期,城市,邻居,然后是“差距”。
gap列保存了我尝试取消堆栈的值。
间隙栏中的箱子包括:0,0.5 - 3,3.5至5,5.5至7等。
我想要取消堆叠数据,以便每个邻居都能看到每个差距的计数。
是否可以在保留区域,城市和日期组的同时拆分间隙值?
此处的最终目标是在每个时间点为每个城市设置一个条形图,其中每个条形图显示邻域的堆积间隙。
当我尝试将该函数用作unstack('gap')时,我得到一个关键错误,上面写着“找不到级别差距”
这是让我来到这里的代码:
minG = tFrame.groupby(['Date','City','Neighborhood','ID']) # there are multiple gap values for each ID
grouped_gap = minG['GAP'] # the series of gaps for each ID
groupedMin = grouped_gap.agg([('Minimum', 'min')]) # I need the minimum gap value for each ID
groupedMin = groupedMin.replace(-1, 0) # the datasource had -1 gap values
label = ['0', '0.5 to 3', '3 to 5', '5 to 7', '7 to 9', '9 to 12', '12+'] # sets the label for each desired bin
groupedMin['gaps'] = pd.cut(groupedMin['Minimum'], bins = [-1, 0.5, 3, 5, 7, 9, 12, 48], labels = label) # places each ID in a bucket, based on the labels
对于从这里获得这些条形图的任何帮助表示赞赏。
编辑:
这就是我所看到的:
此第一张图片显示最小列,这是每辆车的最小间隙值 http://i58.tinypic.com/2co2zja.jpg
第二张图显示了新的列间隙,其中每个值的最小值都已被删除: http://i60.tinypic.com/1j47bm.jpg
使用示例代码和数据框编辑#2:
from pandas import Series, DataFrame
bFrame = DataFrame([["672,059,124","Central Business District","Baltimore","6/1/2013 13:00",4],
["672,059,144","Central Business District","Baltimore","6/1/2013 13:00",1],
["673,928,993","Goucher/Towson (Baltimore County)","Baltimore","6/1/2013 13:00",-1],
["647,380,667","Goucher/Towson (Baltimore County)","Baltimore","6/1/2013 13:00",4],
["801,833,082","Brookline","Boston","6/1/2013 13:00",22],
["801,833,082","Brookline","Boston","6/1/2013 13:00",24],
["821,833,082","Brookline","Boston","6/1/2013 13:00",5],
["956,264,933","Financial District","Boston","6/1/2013 13:00",-1],
["956,264,933","Financial District","Boston","6/1/2013 13:00",2]],
columns=["ID","Neighborhood","City","Date","GAP"])
minGap = bFrame.groupby(['Date','City','Neighborhood','ID']) # there are multiple gap values for each ID
grouped_g = minGap['GAP'] # the series of gaps for each ID
groupedMini = grouped_g.agg([('Minimum', 'min')]) # I need the minimum gap value for each ID
groupedMini = groupedMini.replace(-1, 0) # the datasource had -1 gap values
lab = ['0', '0.5 to 3', '3 to 5', '5 to 7', '7 to 9', '9 to 12', '12+'] # sets the label for each desired bin
groupedMini['gaps'] = pd.cut(groupedMini['Minimum'], bins = [-1, 0.5, 3, 5, 7, 9, 12, 48], labels = lab) # places each ID in a bucket, based on the labels