使用标记值按组过滤Pandas DataFrame

时间:2014-08-16 13:15:35

标签: python-2.7 pandas

我想按组筛选DataFrame,因为 a 之后的nan应该是 a (这类似于标记) ,nans后跟 b ,也是 b 。 我有一个简短的例子:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'group1': ['a',nan,nan,nan,nan,'b',nan,nan,nan,nan],
                   'value1': [0.4,1.1,2,3,4,5,6,7,8,8.8],
                   'value2': [6.4, 6.9,7.1,8,9,10,11,12,13,14]
                   })

我想要的输出是:

In [3]: df[df.group1 == 'a']
Out[3]: 
  group1  value1  value2
0      a     0.4     6.4
1    NaN     1.1     6.9
2    NaN     2.0     7.1
3    NaN     3.0     8.0
4    NaN     4.0     9.0

我会暗示任何提示!

1 个答案:

答案 0 :(得分:1)

您可以使用ffill 向前填充列:

>>> df[df['group1'].fillna(method='ffill') == 'a']
  group1  value1  value2
0      a     0.4     6.4
1    NaN     1.1     6.9
2    NaN     2.0     7.1
3    NaN     3.0     8.0
4    NaN     4.0     9.0

但是,或许更好的解决方案是在原始数据框上转发填充列:

>>> df['group1'].fillna(method='ffill', inplace=True)
>>> df[df['group1'] == 'a']