dropna删除所有具有有效值的行,并且熊猫中仅保留NA行

时间:2020-02-15 00:58:37

标签: python-3.x pandas

我正在尝试清除开源数据中的np值。

我正在使用python3,Jupyter和pandas。

 response = urllib.request.urlopen('https://resources.lendingclub.com/LoanStats3c.csv.zip')
 import shutil
 url = 'https://resources.lendingclub.com/LoanStats3c.csv.zip'
 file_name = 'LoanStats3c.csv.zip'

 with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
     shutil.copyfileobj(response, out_file)
     with zipfile.ZipFile(file_name) as zf:
         zf.extractall()

 loan=pd.read_csv(open('LoanStats3c.csv'), skiprows=1, parse_dates=True, index_col='id') 
 loan.describe()

 # remove all columns with all NAs
 loan = loan.dropna(axis=1, how = 'all')
 loan.describe()

 # remove all rows with any NAs
 loan = loan.dropna(axis = 0)

 loan.describe() 

但是,结果是所有具有所有NA的列:

  loan_amnt  funded_amnt  funded_amnt_inv  installment  annual_inc  dti  \
  count        0.0          0.0              0.0          0.0         0.0  0.0   
  mean         NaN          NaN              NaN          NaN         NaN  NaN    
  std          NaN          NaN              NaN          NaN         NaN  NaN   
  min          NaN          NaN              NaN          NaN         NaN  NaN   
  25%          NaN          NaN              NaN          NaN         NaN  NaN   
  50%          NaN          NaN              NaN          NaN         NaN  NaN   
  75%          NaN          NaN              NaN          NaN         NaN  NaN   
  max          NaN          NaN              NaN          NaN         NaN  NaN   

为什么所有具有有效值的行都消失了,只剩下NA列了?

谢谢

1 个答案:

答案 0 :(得分:0)

当您使用.dropna()时,所有具有NaN值的事件都会从数据框中删除

loan.dropna(axis=1, how = 'all')

将删除具有NaN中所有值的列

loan.dropna(axis = 0)

将删除NaN中具有至少一个值的行

我看到了文件,并且我很确定每一行在NaN中至少有一列

最后,当使用 .describe()时,数据框为空,并且所显示的值是该数据的描述性统计信息,如果您想查看实际的DF,请使用print(df)或jupyter只需将变量放在代码块的末尾

some code
some code
some code 
variable = pd.DataFrame([])

#print(variable)
variable

这将向您显示变量的值