Question

import pandas as pd

df = pd.read_csv('file.csv')
df.dropna(inplace=True)

filter1 = df['col1'] == 'some_value'
filter2 = df['col2'] == 'some_other_value'

df.where(filter1 & filter2, inplace=True)

df.head()

     localCountry localState remoteCountry remoteState  ... col1 col2 col3 num_samples
1250          NaN        NaN           NaN         NaN  ...           NaN           NaN            NaN         NaN
1251          NaN        NaN           NaN         NaN  ...           NaN           NaN            NaN         NaN

我认为dropna()将删除至少包含一个NaN的所有行。为什么这里的结果中存在NaN？我肯定会误解，但是我无法弄清楚为什么在dropna方法之前使用where之后会出现这种情况。

为其他人编辑：

where()方法替换通过的条件为false的值。如果不提供默认值，它将用NaN替换它们。返回两个条件都满足的行不仅仅是查询。

Dataframe.where

如果cond为True，则保留原始值。如果错误，则替换为   从其他相应的值。如果cond是可调用的，则按   Series / DataFrame，并应返回布尔Series / DataFrame或   数组。可调用对象不得更改输入Series / DataFrame（尽管   熊猫不检查）。

Answer 1

我认为问题是您缺少用于设置与任何条件（other）不匹配的行的默认值DataFrame.where：

df = pd.DataFrame({'col1':['some_value','some_value','aaa','dd'],
                   'col2':['some_other_value','dd','some_other_value', 'bb'],
                   'col3':list('abcd')})
print (df)
         col1              col2  col3
0  some_value  some_other_value     0
1  some_value                dd     1
2         aaa  some_other_value     2
3          dd                bb     3

filter1 = df['col1'] == 'some_value'
filter2 = df['col2'] == 'some_other_value'
df.where(filter1 & filter2, other=np.nan, inplace=True)
print (df)
0  some_value  some_other_value    a
1         NaN               NaN  NaN
2         NaN               NaN  NaN
3         NaN               NaN  NaN

如果您更改替换值：

df.where(filter1 & filter2, other='val', inplace=True)
         col1              col2 col3
0  some_value  some_other_value    a
1         val               val  val
2         val               val  val
3         val               val  val

如果要过滤行，请使用boolean indexing：

df1 = df[filter1 & filter2]

为什么Pandas Dataframe.where方法在调用dropna（）之后返回NaN？

1 个答案: