Question

给出一个小的数据集，如下所示：

    id        area
0    1  139000.00㎡
1    2   52770.00㎡
2    3   undefined
3    4   86540.00㎡
4    5   undefined
5    6  465527.00㎡
6    7    2373.00㎡
7    8   24563.00㎡
8    9  180717.73㎡
9   10  286300.00㎡
10  11   39806.00㎡
11  12   undefined
12  13  285610.00㎡

如果使用熊猫的area和>=10000㎡中的undefined行如何过滤？谢谢。

我通过以下方式从numbers列中提取了area：

df['new_area'] = df['area'].str.extract('(\d+)').astype(float)
df = df[~df['new_area'] < 30000]

但是会引发错误：

TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

预期结果如下：

    id        area
0    1  139000.00㎡
1    2   52770.00㎡
2    3   undefined
3    4   86540.00㎡
4    5   undefined
5    6  465527.00㎡
6    9  180717.73㎡
7   10  286300.00㎡
8   11   39806.00㎡
9   12   undefined
10  13  285610.00㎡

Answer 1

创建一个新列，该列将除㎡并将未定义的string替换为零。将其转换为浮点数。

 df['newarea'] = (df.area.str.strip(r'㎡').str.replace('undefined', "0")).astype(float)

布尔值选择新区域等于0并且新区域大于10000：

df[(df.newarea == 0)|(df.newarea >= 10000)]
# df[(df.newarea.eq(0))|(df.newarea.gt(10000))]

根据提取的值和Pandas中的字符串过滤行

1 个答案: