Question

我有这个房地产数据：

neighborhood  type_property  type_negotiation  price
Smallville       house           rent        2000
Oakville       apartment       for sale      100000
King Bay         house         for sale      250000
...

我有这个groupby，它标识数据集中的哪些值是待售房屋，然后在称为df_breakdown的新数据框中为每个邻域返回这些房屋的第10和第90个百分位数和数量。结果看起来像这样：

neighborhood tenthpercentile  ninetiethpercentile  Quantity
King Bay         250000.0             250000.0         1
Smallville        99000.0             120000.0         8
Oakville          45000.0             160000.0         6
...

我现在想将此信息带回到我的原始房地产数据集，并过滤掉所有清单（如果它是针对每个邻域计算得出的百分位数在90％或10％以下的待售房屋）。例如，我想要过滤掉奥克维尔附近一所价格为350000的房子。

我以前使用过此参数：

df1 = df[df.price < df.price.quantile(.90)]

但是我不知道如何使用它来为每个邻域提供不同的值，即使使用起来很有用。预先感谢您的帮助。

Answer 1

可能不是最优雅的方法，但您可以将百分位数聚合加入每个房地产数据。

df.join（df.groupby（'neighborhood'）。quantile（[0.1,0.9]），on ='neighborhood'）

在移动设备上，如果语法不完美，请原谅我。

Answer 2

您可以将它们设置为具有相同的索引，广播百分位数，而只需使用.between

那么首先，

df2 = df2.set_index('neighborhood')
df = df.set_index('neighborhood')

然后，broadcast使用loc

df.loc[:, 't'], df.loc[:, 'n'] = df2.tenthpercentile, df2.ninetiethpercentile

最后，

df.price.between(df.t, df.n)

产生

neighborhood
Smallville    False
Oakville       True
King Bay       True
King Bay      False
dtype: bool

要过滤，只需切片

df[df.price.between(df.t, df.n)]

如何筛选出具有特定值和不同值的数据框中的条目？

2 个答案: