对于数据框,例如
import pandas as pd
import numpy as np
times = [21 , 34, 37, 40, 55, 65, 67, 84, 88, 90 , 91, 97, 104,105, 108]
names = ['bob', 'alice', 'bob', np.NaN , 'ali', 'alice', np.NaN , 'ali', 'moji', 'ali', 'moji', np.NaN , 'bob', 'bob', 'bob']
actions = ['enter', 'enter', 'search', 'search', 'enter', 'search', 'purchase', 'exit', 'enter' , 'enter', 'search', 'purchase', 'exit', 'enter', 'purchase']
df = pd.DataFrame({'name' : names , 'action': actions, 'time' : times})
我只希望name
是NaN
的行,并且也只希望在这些行之前和之后。我可以通过for
和if
命令来做到这一点。但是还有其他更好的方法吗?
答案 0 :(得分:1)
使用Series.isna
然后使用Series.shift
获取前后的行:
s1 = df['name'].isna()
s2 = s1.shift()
s3 = s1.shift(-1)
df[s1 | s2 | s3]
name action time
2 bob search 37
3 NaN search 40
4 ali enter 55
5 alice search 65
6 NaN purchase 67
7 ali exit 84
10 moji search 91
11 NaN purchase 97
12 bob exit 104
答案 1 :(得分:1)
获取一个'name'
为NaN
并与3
窗口大小卷积的蒙版:
ix_na = df['name'].isna().to_numpy()
m = np.convolve(ix_na, np.ones(3), mode='same').astype(bool)
print(df[m])
name action time
2 bob search 37
3 NaN search 40
4 ali enter 55
5 alice search 65
6 NaN purchase 67
7 ali exit 84
10 moji search 91
11 NaN purchase 97
12 bob exit 104
或者我们也可以使用Series.rolling
:
df[df['name'].isna().rolling(3, min_periods=0, center=True).sum().gt(0)]