如何作为条件从数据框中删除行?

时间:2019-10-30 16:38:51

标签: python pandas dataframe

我想删除一个条件下数据框的所有行: ->在此示例中,如果col2和col3为空,我想删除所有具有此条件的行。

我正在研究311,307行datafreme,下面是解决此问题的简单数据框。

谢谢大家!

>install.packages("RInno")
>WARNING: Rtools is required to build R packages but is not currently >installed. Please download and install the appropriate version of Rtools >before proceeding:
>
>https://cran.rstudio.com/bin/windows/Rtools/
>trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/RInno_1.0.1.zip'
>Content type 'application/zip' length 2511299 bytes (2.4 MB)
>downloaded 2.4 MB
>
>package ‘RInno’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
>   C:\Users\612241921\AppData\Local\Temp\RtmpMjpMjs\downloaded_packages
> require(RInno)
>Loading required package: RInno
>Version: 1.0.1
>Maintainer: Jon Hill <jon.mark.hill@gmail.com>
>License: GPL-3 | file LICENSE
>URL: www.ficonsulting.com
>> RInno::install_inno()
>trying URL 'https://github.com/'
>Content type 'text/html; charset=utf-8' length unknown
>downloaded 131 KB
>
>
>The file was downloaded successfully into:
> C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com 

>Running the installer now...
>This version of C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs\github.com >is not compatible with the version of Windows you're running. Check your >computer's system information and then contact the software publisher.
>
>Installation status:  FALSE . Removing the file:
> C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com 
> (In the future, you may keep the file by setting keep_install_file=TRUE) 
>Warning message:
>In shell(install_cmd, wait = wait, ...) :
>  'C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com' execution >failed with error code 1
> 
> create_app(app_name = "pelipoddb", app_dir = "app", pkgs = c("shiny",
>+                                                       "shinydashboard","ply>r","dplyr","plotly"),include_R = TRUE)
>Downloading R-3.6.1 …

在删除所有尊重该划分的行之后,结果应为:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)



df.head()
    col1 col2   col3
0   1    741    Nan
1   2    332    2
2   7    7      74
3   47   Nan    Nan
4   12   127    127
5   67   Nan    Nan
6   58   548    548

3 个答案:

答案 0 :(得分:1)

使用Boolean indexingDataFrame.isnaDataFrame.isnull来检查NaN或Null值。 用DataFrame.sumSeries.le选择每行允许的最大NaN个数:

df=df.replace('Nan',np.nan)
new_df=df[df.isnull().sum(axis=1).le(1)]
print(new_df)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

  

要指定列:

DataFrame.all

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)

   col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0

答案 1 :(得分:1)

使用dropna

axis = 0删除行,thresh=1具有删除行所需的非空值数量。

如果要定义作为放行基础的列,则可以使用subset=['col2', 'col3']

您可以尝试以下方法:

df = df.dropna(axis=0, subset=['col2', 'col3'], how="any", thresh=1)

答案 2 :(得分:0)

部署@ansev提出的解决方案后,一切正常:

import pandas as pd

obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548],  'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }

df = pd.DataFrame(data=obj)

df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]

print(df_filtered)

col1   col2   col3
0     1  741.0    NaN
1     2  332.0    2.0
2     7    7.0   74.0
4    12  127.0  127.0
6    58  548.0  548.0