从熊猫df中删除行

时间:2018-08-03 04:33:22

标签: python pandas sorting select

我正在尝试从rows中删除所有pandas df。具体来说,当rowX下的Col A为空时。因此,如果rowX下的Col A为空,我想删除所有这些行,直到值string下有X

import pandas as pd

d = ({
    'A' : ['X','','','X','Foo','','X','Fou','','X','Bar'],           
    'B' : ['Val',1,3,'Val',1,3,'Val',1,3,'Val',1],
    'C' : ['Val',2,4,'Val',2,4,'Val',2,4,'Val',2],
    })

df = pd.DataFrame(data=d)

输出:

      A    B    C
0     X  Val  Val
1          1    2
2          3    4
3     X  Val  Val
4   Foo    1    2
5          3    4
6     X  Val  Val
7   Fou    1    2
8          3    4
9     X  Val  Val
10  Bar    1    2

我尝试过:

df = df[~(df['A'] == 'X').shift().fillna(False)]

但这会删除X后面的所有内容。我只希望在X下的下一行为空的情况下将其删除。

预期:

     A    B    C
0    X  Val  Val
1  Foo    1    2
2         3    4
3    X  Val  Val
4  Fou    1    2
5         4    4
6    X  Val  Val
7  Bar    1    2

3 个答案:

答案 0 :(得分:1)

使用:

m1 = df['A'] == 'X'
g =  m1.cumsum()
m = (df['A'] == '') | m1

df = df[~m.groupby(g).transform('all')]
print (df)
      A    B    C
3     X  Val  Val
4   Foo    1    2
5          3    4
6     X  Val  Val
7   Fou    1    2
8          3    4
9     X  Val  Val
10  Bar    1    2

详细信息

m1 = df['A'] == 'X'
g =  m1.cumsum()
m = (df['A'] == '') | m1

print (pd.concat([df,
                  df['A'] == 'X',
                  m1.cumsum(),
                  (df['A'] == ''), 
                  m,
                  m.groupby(g).transform('all'),
                  ~m.groupby(g).transform('all')], axis=1,
       keys=['orig','==X','g','==space','m', 'all', 'inverted all']))

   orig              ==X  g ==space      m    all inverted all
      A    B    C      A  A       A      A      A            A
0     X  Val  Val   True  1   False   True   True        False
1          1    2  False  1    True   True   True        False
2          3    4  False  1    True   True   True        False
3     X  Val  Val   True  2   False   True  False         True
4   Foo    1    2  False  2   False  False  False         True
5          3    4  False  2    True   True  False         True
6     X  Val  Val   True  3   False   True  False         True
7   Fou    1    2  False  3   False  False  False         True
8          3    4  False  3    True   True  False         True
9     X  Val  Val   True  4   False   True  False         True
10  Bar    1    2  False  4   False  False  False         True

说明

  1. X进行比较,并为从Xg的组创建累积总和。
  2. 链2布尔掩码-将X和空白区域与m进行比较
  3. groupbytransformDataFrameGroupBy.all,对于仅具有True的组的返回True s
  4. 最后反转并按boolean indexing过滤

答案 1 :(得分:0)

这是您的解决方案:

  let headers = {
    'Content-Type': 'application/json;charset=utf-8'
  };

  if(token !== '') {
    headers['TOKEN'] = token
  }

  return this.http.post(uri, data, {headers})
         .then(this.extractData)
         .catch(this.handleError);

结果是:

(df['A'] == 'X').shift()
0       NaN
1      True
2     False
3     False
4      True
5     False
6     False
7      True
8     False
9     False
10     True
Name: A, dtype: object
In [15]:

(df['A'] == '')
Out[15]:
0     False
1      True
2      True
3     False
4     False
5      True
6     False
7     False
8      True
9     False
10    False
Name: A, dtype: bool
In [14]:

((df['A'] == '') & (df['A'] == 'X').shift())
Out[14]:
0     False
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
Name: A, dtype: bool

编辑: 如果需要,可以在while循环中进行。     old_size_df = df.size     new_size_df = 0

df[~((df['A'] == '') & (df['A'] == 'X').shift())]
Out[16]:
A   B   C
0   X   Val Val
2       3   4
3   X   Val Val
4   Foo 1   2
5       3   4
6   X   Val Val
7   Fou 1   2
8       3   4
9   X   Val Val
10  Bar 1   2

答案 2 :(得分:0)

这是具有自定义套用功能的解决方案:

d = ({
    'A' : ['X','','','X','Foo','','X','Fou','','X','Bar'],           
    'B' : ['Val',1,3,'Val',1,3,'Val',1,3,'Val',1],
    'C' : ['Val',2,4,'Val',2,4,'Val',2,4,'Val',2],
})
df = pd.DataFrame(data=d)

is_x = False
def fill_empty_a(row):
    global is_x
    if row['A'] == '' and is_x:
            row['A'] = None
    else:
            is_x = row['A'] == 'X'
    return row

(df.apply(fill_empty_a, axis=1)
   .dropna()
   .reset_index(drop=True))
#      A    B    C
# 0    X  Val  Val
# 1    X  Val  Val
# 2  Foo    1    2
# 3         3    4
# 4    X  Val  Val
# 5  Fou    1    2
# 6         3    4
# 7    X  Val  Val
# 8  Bar    1    2