我想从 pandas 数据框中删除一些数据。我有一个如下所示的数据框:
sex age race c_charge_desc
Male 0.204082 Hispanic Felony Battery (Dom Strang)
Male 0.122449 African-American Felony Driving While Lic Suspend
Female 0.163265 African-American Neglect Child / No Bodily Harm
Male 0.081633 African-American arrest case no charge
Male 0.530612 African-American Felony Driving While Lic Suspend
有一个名为 c_charge_desc 的列,其中包含许多不同的电荷描述。我想删除一些总数小于阈值的费用说明。
Battery 924
arrest case no charge 904
Possession of Cocaine 378
Grand Theft in the 3rd Degree 352
Driving While License Revoked 158
...
Compulsory Attendance Violation 1
Possession Of Clonazepam 1
Possession Of Anabolic Steroid 1
Attempt Burglary (Struct) 1
Fail To Redeliver Hire Prop 1
Name: c_charge_desc, Length: 387, dtype: int64
这是这个栏目的总结,可以看到有很多描述的个数都是1,我想去掉那些出现次数小于10的描述。
我试过了
df[df['c_charge_desc'].value_counts() < 10]
但这不起作用,我收到此错误
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
这将是我的预期输出
Battery 924
arrest case no charge 904
Possession of Cocaine 378
Grand Theft in the 3rd Degree 352
Driving While License Revoked 158
...
Some charges 10
Some charges 10
Some charges 10
Some charges 10
Name: c_charge_desc, Length: 200, dtype: int64
答案 0 :(得分:1)
groupby-filter
可能是最简洁的。例如,仅保留发生多次的费用:
return RestaurantSearch.fromJson(response.data);
或者,创建一个临时 df.groupby('c_charge_desc').filter(lambda group: len(group) > 1)
# sex age race c_charge_desc
# 1 Male 0.122449 African-American Felony Driving While Lic Suspend
# 4 Male 0.530612 African-American Felony Driving While Lic Suspend
列以用作过滤器:
counter