根据其他行的某些条件获取pandas行的内容

时间:2016-04-28 16:29:23

标签: python pandas

我有一个pandas DataFrame df1,内容如下:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12           
   D               3             4

我想计算每个串行唯一序列的出现次数。如果序列号小于2,我想将该行的年份和当前值替换为nan。我想有这样的事情:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                             9
   C              12            13
   C              12 
   D              nan           nan      

2 个答案:

答案 0 :(得分:1)

您可以合并value_countsltreindex以获取布局数组,其中将值更改为nan,然后使用loc进行制作变化。

serial_filter = df1['Serial N'].value_counts().lt(2).reindex(df1['Serial N'])
df1.loc[serial_filter.values, ['year', 'current']] = np.nan

结果输出:

  Serial N  year  current
0        B  10.0     14.0
1        B  10.0     16.0
2        B  11.0     10.0
3        B  11.0      NaN
4        B  11.0     15.0
5        C  12.0     11.0
6        C   NaN      9.0
7        C  12.0     13.0
8        C  12.0      NaN
9        D   NaN      NaN

答案 1 :(得分:0)

设置

import pandas as pd
from StringIO import StringIO

text = """Serial_N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            nan
   B              11            15
   C              12            11
   C              nan              9
   C              12            13
   C              12           nan
   D               3             4"""

df1 = pd.read_csv(StringIO(text), delim_whitespace=True)
df1.columns = ['Serial N', 'year', 'current']

现在,我上面显示的是df1

解决方案

serial_filter = df1.groupby('Serial N').apply(lambda x: len(x))
serial_filter = serial_filter[serial_filter > 1]
mask = df1.apply(lambda x: x.ix['Serial N'] in serial_filter, axis=1)
df1 = df1[mask]

演示和解释

serial_filter = df1.groupby('Serial N').apply(lambda x: len(x))

print serial_filter

Serial N
B    5
C    4
D    1
dtype: int64

生成每个唯一Serial N

的计数
serial_filter = serial_filter[serial_filter > 1]

print serial_filter

Serial N
B    5
C    4
dtype: int64

重新定义它,使其仅包含大于1的Serial N

mask = df1.apply(lambda x: x.ix['Serial N'] in serial_filter, axis=1)

print mask

0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9    False
dtype: bool

创建要在df1

上使用的过滤器掩码
df1 = df1[mask]

print df1

  Serial N  year  current
0        B  10.0     14.0
1        B  10.0     16.0
2        B  11.0     10.0
3        B  11.0      NaN
4        B  11.0     15.0
5        C  12.0     11.0
6        C   NaN      9.0
7        C  12.0     13.0
8        C  12.0      NaN

更新df1