Question

已经尝试了一段时间但没有到达任何地方。请考虑下面的DF。

    Id  YearBuilt  SalePrice Neighborhood
    1       2003     208500      CollgCr
    2       1976     181500      Veenker
    3       2001     223500      CollgCr
    4       1915     140000      Crawfor
    5       2000     250000      NoRidge
    6       1993     143000      Mitchel
    7       2004     307000      Somerst
    8       1973     200000       NWAmes
    9       1931     129900      OldTown
    10       1939     118000      BrkSide
    11       1965     129500       Sawyer
    12       2005     345000      NridgHt
    13       1962     144000       Sawyer
    14       2006     279500      CollgCr
    15       1960     157000        NAmes
    16       1929     132000      BrkSide
    17       1970     149000        NAmes

我想将数据分组到邻域中，如果邻域数小于10，则应将其放入组other。我已经看到其他答案，但无法解释它们。我试过了

house_df['newColumn'] = house_df['Neighborhood'].mask(house_df['Neighborhood'].count < 50, 'other')

也尝试了

house_df.groupby['Neighborhood'].filter(lambda x: x.count < 10)

但这不起作用。还尝试groupby邻居并应用过滤器，但没有去。请帮忙。

这是我希望实现的一个例子

 Id  YearBuilt  SalePrice Neighborhood newColumn
1       2003     208500      CollgCr   Collgcr
2       1976     181500      Veenker    other
3       2001     223500      CollgCr    CollgCr
4       1915     140000      Crawfor    other
5       2000     250000      NoRidge    NoRidge
6       1993     143000      Mitchel    Mitchel
7       2004     307000      Somerst    other
8       1973     200000       NWAmes    NWAmes

Answer 1

使用value_counts计算社区，map使用lambda生成适当的分组。

vc = df.Neighborhood.value_counts()

df = df.assign(
    newColumn=df.Neighborhood.map(
        lambda x: x if vc.at[x] > 1 else 'other'
    )
)

按值计算过滤数据框

1 个答案: