我想删除一个条件下数据框的所有行: ->在此示例中,如果col2和col3为空,我想删除所有具有此条件的行。
我正在研究311,307行datafreme,下面是解决此问题的简单数据框。
谢谢大家!
>install.packages("RInno")
>WARNING: Rtools is required to build R packages but is not currently >installed. Please download and install the appropriate version of Rtools >before proceeding:
>
>https://cran.rstudio.com/bin/windows/Rtools/
>trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/RInno_1.0.1.zip'
>Content type 'application/zip' length 2511299 bytes (2.4 MB)
>downloaded 2.4 MB
>
>package ‘RInno’ successfully unpacked and MD5 sums checked
>
>The downloaded binary packages are in
> C:\Users\612241921\AppData\Local\Temp\RtmpMjpMjs\downloaded_packages
> require(RInno)
>Loading required package: RInno
>Version: 1.0.1
>Maintainer: Jon Hill <jon.mark.hill@gmail.com>
>License: GPL-3 | file LICENSE
>URL: www.ficonsulting.com
>> RInno::install_inno()
>trying URL 'https://github.com/'
>Content type 'text/html; charset=utf-8' length unknown
>downloaded 131 KB
>
>
>The file was downloaded successfully into:
> C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com
>Running the installer now...
>This version of C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs\github.com >is not compatible with the version of Windows you're running. Check your >computer's system information and then contact the software publisher.
>
>Installation status: FALSE . Removing the file:
> C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com
> (In the future, you may keep the file by setting keep_install_file=TRUE)
>Warning message:
>In shell(install_cmd, wait = wait, ...) :
> 'C:\Users\612241~1\AppData\Local\Temp\RtmpMjpMjs/github.com' execution >failed with error code 1
>
> create_app(app_name = "pelipoddb", app_dir = "app", pkgs = c("shiny",
>+ "shinydashboard","ply>r","dplyr","plotly"),include_R = TRUE)
>Downloading R-3.6.1 …
在删除所有尊重该划分的行之后,结果应为:
import pandas as pd
obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548], 'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }
df = pd.DataFrame(data=obj)
df.head()
col1 col2 col3
0 1 741 Nan
1 2 332 2
2 7 7 74
3 47 Nan Nan
4 12 127 127
5 67 Nan Nan
6 58 548 548
答案 0 :(得分:1)
使用Boolean indexing
和DataFrame.isna或DataFrame.isnull
来检查NaN或Null值。
用DataFrame.sum
和Series.le
选择每行允许的最大NaN
个数:
df=df.replace('Nan',np.nan)
new_df=df[df.isnull().sum(axis=1).le(1)]
print(new_df)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0
要指定列:
df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0
答案 1 :(得分:1)
使用dropna
axis = 0
删除行,thresh=1
具有删除行所需的非空值数量。
如果要定义作为放行基础的列,则可以使用subset=['col2', 'col3']
。
您可以尝试以下方法:
df = df.dropna(axis=0, subset=['col2', 'col3'], how="any", thresh=1)
答案 2 :(得分:0)
部署@ansev提出的解决方案后,一切正常:
import pandas as pd
obj = {'col1': [1, 2,7,47,12,67,58], 'col2': [741, 332,7,'Nan',127,'Nan',548], 'col3': ['Nan', 2,74,'Nan',127,'Nan',548] }
df = pd.DataFrame(data=obj)
df=df.replace('Nan',np.nan)
df_filtered=df[~df[['col2','col3']].isnull().all(axis=1)]
print(df_filtered)
col1 col2 col3
0 1 741.0 NaN
1 2 332.0 2.0
2 7 7.0 74.0
4 12 127.0 127.0
6 58 548.0 548.0