Question

我有一个数据集，其中“类型”列基本上是形状，与此相对应的是，“体积”列由该形状的体积组成。

现在我需要执行以下任务：

按形状分组
对于每种形状，按体积分组
然后为每个形状和体积定义一个范围并形成容器

输入：

 Type             Volume

 Cylinder          100
 Square            300
 Cylinder          200
 Oval              100
 Square            320
 Cylinder          150
 Oval              600
 Round             1000
 Square            900
 Round             1500

输出：

 Type              Volume       Bin

 Cylinder          100            1
 Cylinder          150            1
 Cylinder          200            2
 Oval              100            1
 Oval              600            3
 Round             1000           1
 Round             1500           2
 Square            300            1
 Square            320            1
 Square            900            3

垃圾箱如下：

1.Cylinder-> Bin1（100-200），Bin2（201-300）....

2。椭圆-> Bin1（100-200），..... Bin3（500-600）.... ....

代码：

  grouped=df_dim.groupby('Type', as_index=False)
  def test(group):
     return group.reset_index()
  def group_vol(group):
     groupedVol = 
         group.groupby(pd.cut(group["Target_BrimVol"],
         np.arange(0,5000,200)),as_index=False)

     return groupedVol.apply(test)

  gr = grouped.apply(group_vol)
  print(gr)

Answer 1

我认为您可以尝试以下代码。

testdf = df.groupby('Type',as_index=False).apply(lambda x: x.groupby(pd.cut(x["Vol"],np.arange(x["Volume"].min(),x["Volume"].max(),200)),as_index=False).apply(test))

这里发生的是，第一个groupby基本上将Dataframe分为“类型”类别，然后您要根据范围对其进行分组。为此，您可以使用pd.cut函数使用lambda函数再次将其分组，以根据您的范围对间隔进行小截割。在这种情况下，我只是采用最大值和最小值并将其以200的间隔进行切割。在此之后，如果您想将输出合并回一起以再次形成一个Dataframe，请使用另一个应用将它们合并回去。像这样

def test(group):
   #Write your function here. Whatever you want to perform.
   return group.merge(group)

我正在使用as_index=False在这里重置索引，以便按照新索引重新排列数据帧。

希望这会有所帮助。

编辑：- 对于垃圾箱，您不必担心，因为每个groupby都会创建一个新索引，您可以将其用于您的目的。就像

Index1  Index2  Type  Volume
0 0 Cylinder  100
0 0 Cylinder  140
0 1 Cylinder  250
1 0 Oval  154
1 4 Oval 999
2 1 Circle  328

如何对分类列和数字列进行分组，并基于此分组对数值进行分类

1 个答案: