在熊猫中过滤

时间:2019-10-09 18:45:26

标签: python pandas dataframe

我想做的事可能最好用一个例子来说明。假设我们有以下数据框:

ID            Category          Label          Price
----------------------------------------------------
00001         Low               Alpha          1.00
00001         Low               Beta           1.50
00001         Med               Chi            2.00
00001         Med               Delta          2.50
00001         High              Epsilon        3.00
00001         High              Phi            3.50
00002         Low               Alpha          1.00
00002         Low               Beta           1.50
00002         Med               Chi            2.50
00002         Med               Delta          2.50
00002         High              Epsilon        3.00
00002         High              Phi            3.50

对于每个ID和每个ID中的每个标签,我想返回价格最高的标签。例如:

ID            Category          Label          Price
----------------------------------------------------
00001         Low               Beta           1.50
00001         Med               Delta          2.50
00001         High              Phi            3.50
00002         Low               Beta           1.50
00002         Med               Delta          2.50
00002         High              Phi            3.50

最初,我想到了使用嵌套的FOR语句执行此操作-就像遍历多变量数组一样,但是我知道这不是Pandas的方式。

2 个答案:

答案 0 :(得分:3)

IIUC,您可以尝试以下方法:

df.loc[df.groupby(['ID','Category'], group_keys=False)['Price'].idxmax()]

输出:

       ID Category  Label  Price
5   00001     High    Phi    3.5
1   00001      Low   Beta    1.5
3   00001      Med  Delta    2.5
11  00002     High    Phi    3.5
7   00002      Low   Beta    1.5
8   00002      Med    Chi    2.5

答案 1 :(得分:1)

同样,您可以按ID,类别和标签分组,然后按价格汇总

foreach (var entity in db.ChangeTracker.Entries())
{
    if(entity.State == EntityState.Detached || entity.State == EntityState.Unchanged)
    {
        continue;
    }

    var audits = new List<Audit>();

    //the typeId is a string representing the primary keys of this entity.
    //this will not be available for ADDED entities with generated primary keys, so we need to update those later
    string typeId;

    if (entity.State == EntityState.Added && entity.Properties.Any(prop => prop.Metadata.IsPrimaryKey() && prop.IsTemporary))
    {
        typeId = null;
    }
    else
    {
        var primaryKey = entity.Metadata.FindPrimaryKey();
        typeId = string.Join(',', primaryKey.Properties.Select(prop => prop.PropertyInfo.GetValue(entity.Entity)));
    }

    //record an audit for each property of each entity that has been changed
    foreach (var prop in entity.Properties)
    {
        //don't audit anything about primary keys (those can't change, and are already in the typeId)
        if(prop.Metadata.IsPrimaryKey() && entity.Properties.Any(p => !p.Metadata.IsPrimaryKey()))
        {
            continue;
        }

        //ignore values that won't actually be written
        if(entity.State != EntityState.Deleted && entity.State != EntityState.Added && prop.Metadata.AfterSaveBehavior != PropertySaveBehavior.Save)
        {
            continue;
        }

        //ignore values that won't actually be written
        if (entity.State == EntityState.Added && prop.Metadata.BeforeSaveBehavior != PropertySaveBehavior.Save)
        {
            continue;
        }

        //ignore properties that didn't change
        if(entity.State == EntityState.Modified && !prop.IsModified)
        {
            continue;
        }

        var audit = new Audit
        {
            Action = (int)entity.State,
            TypeId = typeId,
            ColumnName = prop.Metadata.SqlServer().ColumnName,
            OldValue = (entity.State == EntityState.Added || entity.OriginalValues == null) ? null : JsonConvert.SerializeObject(prop.OriginalValue),
            NewValue = entity.State == EntityState.Deleted ? null : JsonConvert.SerializeObject(prop.CurrentValue)
        };
    }

    //Do something with audits
}