我正在努力从df
每组中的数据帧id
中过滤/删除行,如果它们符合以下规则:
2019-08-01
city
和commerce
列中的值都不为空。 id city commerce date price
0 1 bj ft 2019/7/1 7
1 1 bj ft 2019/8/1 5
2 1 NaN NaN 2019/8/1 6
3 2 bj ft 2019/7/1 3
4 2 bj ft 2019/8/1 4
5 2 NaN NaN 2019/8/1 7
6 3 bj ft 2019/7/1 7
7 3 bj ft 2019/8/1 5
代码:
df[(df["date"].isin(['2019-08-01'])) & (df[df[['city', 'commerce']].notnull()])]
但是我得到一个错误:
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
我的预期结果将是这样:
id city commerce date price
0 1 bj ft 2019/7/1 7
1 1 NaN NaN 2019/8/1 6
2 2 bj ft 2019/7/1 3
3 2 NaN NaN 2019/8/1 7
4 3 bj ft 2019/7/1 7
5 3 bj ft 2019/8/1 5
答案 0 :(得分:2)
如果要比较多个列,则需要DataFrame.any
来测试每行至少一个True
或DataFrame.all
来测试每行的所有True
:
df = df[~(df["date"].isin(['2019/8/1'])) | df[['city', 'commerce']].isnull().any(axis=1)]
或者:
mask = (df["date"].isin(['2019/8/1'])) & df[['city', 'commerce']].notnull().all(1)
df = df[~mask]
print (df)
id city commerce date price
0 1 bj ft 2019/7/1 7
2 1 NaN NaN 2019/8/1 6
3 2 bj ft 2019/7/1 3
5 2 NaN NaN 2019/8/1 7
6 3 bj ft 2019/7/1 7