具有自定义聚合功能的熊猫数据框上的groupby

时间:2019-02-23 17:46:29

标签: python pandas function group-by

假设我们有数据框:

df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
                   'Max Speed' : [380.1, 370.3, 24.77, -12.55]})    

我必须构造一个类似于绝对最小值的函数,它必须将元素返回更接近零的值。在本例中,通过按“动物”分组,它应返回:

   Animal  Max Speed
0  Falcon     370.30
1  Parrot     -12.55

我尝试过这样的功能:

def nearzero():
   absolute = [abs(number) for number in data]
   i = absolute.index(min(absolute))
   return data[i]

它应该返回在绝对值最小的索引中找到的元素。但这不起作用:

df.groupby(['Animal']).agg({'Max Speed': [nearzero]})

函数或groupby是否定义错误?

3 个答案:

答案 0 :(得分:1)

我认为您需要DataFrameGroupBy.idxmin来获得每组分钟的索引,还需要将列Max Speed转换为abs,最后一次调用loc来选择行:

df = df.loc[df['Max Speed'].abs().groupby(df['Animal']).idxmin()]
print (df)
   Animal  Max Speed
1  Falcon     370.30
3  Parrot     -12.55

另一种带有新列的解决方案:

df['Max Speed Abs'] = df['Max Speed'].abs()
df = df.loc[df.groupby('Animal')['Max Speed Abs'].idxmin()]

编辑:对于groupby乘以多个Series,请使用:

df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
                   'Max Speed' : [380.1, 370.3, 24.77, -12.55],
                   'Dates':['2010-10-09'] * 4})  

df = df.loc[df['Max Speed'].abs().groupby([df['Animal'], df['Dates']]).idxmin()]
print (df)
   Animal  Max Speed       Dates
1  Falcon     370.30  2010-10-09
3  Parrot     -12.55  2010-10-09

答案 1 :(得分:1)

您可以在python中定义一个函数,

rooms

或使用生成器,

def abs_min(x):
    for elem in x:
        if abs(elem) == min(abs(x)):
            return elem

df.groupby('Animal')['Max Speed'].apply(abs_min)

Animal
Falcon    370.30
Parrot    -12.55

答案 2 :(得分:1)

将您的功能定义为:

  <nav class="navbar navbar-expand-lg navbar-light bg-light">
    <a class="navbar-brand" href="#">Navbar</a>
    <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
      <span class="navbar-toggler-icon"></span>
    </button>

    <div class="collapse navbar-collapse" id="navbarSupportedContent">
      <ul class="navbar-nav mr-auto">
        <li class="nav-item active">
          <a class="nav-link" href="#">Home <span class="sr-only">(current)</span></a>
        </li>
        <li class="nav-item">
          <a class="nav-link" href="#">Link</a>
        </li>
        <li class="nav-item dropdown">
          <a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
            <?php echo e(Auth::user()->name); ?>
          </a>
          <div class="dropdown-menu" aria-labelledby="navbarDropdown">
            <a class="dropdown-item" href="/dashboard">Dashboard</a>
            <a class="dropdown-item" href="<?php echo e(route('logout')); ?>" onclick="event.preventDefault(); document.getElementById('logout-form').submit();"><?php echo e(__('Logout')); ?></a>
            <form id="logout-form" action="<?php echo e(route('logout')); ?>" method="POST" style="display: none;">
              <?php echo csrf_field(); ?>
            </form>
          </div>
        </li>
        <li class="nav-item">
          <a class="nav-link disabled" href="#">Disabled</a>
        </li>
      </ul>
      <form class="form-inline my-2 my-lg-0">
        <input class="form-control mr-sm-2" type="search" placeholder="Search" aria-label="Search">
        <button class="btn btn-outline-success my-2 my-sm-0" type="submit">Search</button>
      </form>
    </div>
  </nav>

请注意,此函数以 df列系列)作为 参数,但必须从基础列表中进行选择。

然后致电:

def nearzero(data):
    dat = data.tolist()
    absolute = [abs(number) for number in dat]
    return dat[absolute.index(min(absolute))]

第二种选择,而无需显式转换为基础列表:

将函数定义为:

df.groupby(['Animal'])['Max Speed'].apply(nearzero)

然后致电:

def nearzero2(data):
    return data[data.abs().idxmin()]

或者如您的问题那样获得结果:

df.groupby(['Animal'])['Max Speed'].apply(nearzero2)