取消分组数据框

时间:2014-01-28 18:48:02

标签: python pandas

我有一个分组的pandas数据框,按3层分组:日期,城市,邻居,然后是“差距”。

gap列保存了我尝试取消堆栈的值。

间隙栏中的箱子包括:0,0.5 - 3,3.5至5,5.5至7等。

我想要取消堆叠数据,以便每个邻居都能看到每个差距的计数。

是否可以在保留区域,城市和日期组的同时拆分间隙值?

此处的最终目标是在每个时间点为每个城市设置一个条形图,其中每个条形图显示邻域的堆积间隙。

当我尝试将该函数用作unstack('gap')时,我得到一个关键错误,上面写着“找不到级别差距”

这是让我来到这里的代码:

minG = tFrame.groupby(['Date','City','Neighborhood','ID']) # there are multiple gap values for each ID

grouped_gap = minG['GAP'] # the series of gaps for each ID

groupedMin = grouped_gap.agg([('Minimum', 'min')]) # I need the minimum gap value for each ID

groupedMin = groupedMin.replace(-1, 0) # the datasource had -1 gap values

label = ['0', '0.5 to 3', '3 to 5', '5 to 7', '7 to 9', '9 to 12', '12+'] # sets the label for each desired bin

groupedMin['gaps'] = pd.cut(groupedMin['Minimum'], bins = [-1, 0.5, 3, 5, 7, 9, 12, 48], labels = label) # places each ID in a bucket, based on the labels

对于从这里获得这些条形图的任何帮助表示赞赏。

编辑:

这就是我所看到的:

  1. 此第一张图片显示最小列,这是每辆车的最小间隙值 http://i58.tinypic.com/2co2zja.jpg

  2. 第二张图显示了新的列间隙,其中每个值的最小值都已被删除: http://i60.tinypic.com/1j47bm.jpg

  3. 使用示例代码和数据框编辑#2:

    from pandas import Series, DataFrame
    
    bFrame = DataFrame([["672,059,124","Central Business District","Baltimore","6/1/2013 13:00",4],
                       ["672,059,144","Central Business District","Baltimore","6/1/2013 13:00",1], 
                       ["673,928,993","Goucher/Towson (Baltimore County)","Baltimore","6/1/2013 13:00",-1],
                       ["647,380,667","Goucher/Towson (Baltimore County)","Baltimore","6/1/2013 13:00",4], 
                       ["801,833,082","Brookline","Boston","6/1/2013 13:00",22], 
                       ["801,833,082","Brookline","Boston","6/1/2013 13:00",24],
                       ["821,833,082","Brookline","Boston","6/1/2013 13:00",5],
                       ["956,264,933","Financial District","Boston","6/1/2013 13:00",-1],
                       ["956,264,933","Financial District","Boston","6/1/2013 13:00",2]],
                       columns=["ID","Neighborhood","City","Date","GAP"])
    minGap = bFrame.groupby(['Date','City','Neighborhood','ID']) # there are multiple gap values for each ID
    
    grouped_g = minGap['GAP'] # the series of gaps for each ID
    
    groupedMini = grouped_g.agg([('Minimum', 'min')]) # I need the minimum gap value for each ID
    
    groupedMini = groupedMini.replace(-1, 0) # the datasource had -1 gap values
    
    lab = ['0', '0.5 to 3', '3 to 5', '5 to 7', '7 to 9', '9 to 12', '12+'] # sets the label for each desired bin
    
    groupedMini['gaps'] = pd.cut(groupedMini['Minimum'], bins = [-1, 0.5, 3, 5, 7, 9, 12, 48], labels = lab) # places each ID in a bucket, based on the labels
    

0 个答案:

没有答案