从聚合的数据框(Python)中删除异常值

时间:2018-06-20 13:58:00

标签: python pandas dataframe aggregate outliers

我的原始数据框看起来像这样,只有第一行...:

trait Named {
    fn name(self: Box<Self>) -> String;
}

struct Person {
    first_name: String,
    last_name: String,
}

impl Named for Person {
    fn name(self: Box<Self>) -> String {
        format!("{} {}", self.first_name, self.last_name)
    }
}

pub struct Mech<'a> {
    driver: Box<Named + 'a>,
}

impl<'a> Mech<'a> {
    pub fn driver_name(self) -> String {
        self.driver.name()
    }
}

fn main() {}

我用以下代码对其进行了汇总:

  categories  id products 
0          A   1       a       
1          B   1       a       
2          C   1       a       
3          A   1       b       
4          B   1       b       
5          A   2       c      
6          B   2       c  

然后是数据框,我也从DF中添加了n个离群值:

df2 = df.groupby('id').products.nunique().reset_index().merge(
pd.crosstab(df.id, df.categories).reset_index()

现在,我正在尝试删除新DF中的异常值:

    id products A B C
0    1       2  2 2 1    
1    2       1  1 1 0    
2    3      50  1 1 30

然后我得到的是:

#remove outliners
del df2['id']
df2 = df2.loc[df2['products']<=20,[str(i) for i in df2.columns]]

它删除了异常值,但是为什么现在我在categorie列中仅获得NaN?

1 个答案:

答案 0 :(得分:0)

df2 = df2.loc[df2['products'] <= 20]