根据NaN将列值替换为0或1

时间:2017-09-30 13:20:28

标签: python pandas dataframe replace nan

以下是CSV数据的快照, file

我想用0替换null或者' nan',并在列中删除所有其他条目'死亡年份':

import pandas as pd
import numpy as np
mydata_csv = pd.read_csv('D:\Python\character-deaths.csv',sep = ',',encoding = 'utf-8')
mydata_csv
del mydata_csv['Book of Death']
del mydata_csv['Death Chapter']

if mydata_csv['Death Year'] == np.nan:
 mydata_csv['Death Year'] = 0
else:
 mydata_csv['Death Year'] = 1

以上代码产生以下错误:
ValueError:Series的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。

4 个答案:

答案 0 :(得分:2)

你有两个问题:

  1. 对系列/数据帧的逻辑运算不会产生标量结果。它产生一个if无法理解的向量。

  2. NaN != NaN;即使列为if,您的NaN条件也永远不会成立。

    In [9]: np.nan == np.nan
    Out[9]: False
    
  3. 只需使用 np.where

    mydata_csv['Death Year'] = np.where(mydata_csv['Death Year'].isnull(), 0, 1)
    

    我建议的另一项改进是在删除列时使用 df.drop 。而不是del,尝试更多的熊猫版本:

    mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], 1)
    

答案 1 :(得分:0)

您没有指定哪一行,但我怀疑您的问题在

if mydata_csv['Death Year'] == np.nan:

如果是这样,请尝试检查列是否首先包含数据,这是

if mydata_csv['Death Year'] is not None and mydata_csv['Death Year'] == np.nan:

希望有所帮助

答案 2 :(得分:0)

我认为更好的是使用notnull表示布尔掩码,然后将其强制转换为int - > True1False0

要使用missing data,必须使用isnullnotnull等特殊功能,请查看docs以获取更多信息。

#omit `sep=','` because default parameter
mydata_csv = pd.read_csv('D:\Python\character-deaths.csv', encoding = 'utf-8')
#simplify double del
mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], axis=1)
mydata_csv['Death Year'] = mydata_csv['Death Year'].notnull().astype(int)

样品:

mydata_csv = pd.DataFrame({'Book of Death':[4,5,4,5,5,4],
                           'Death Chapter':[7,8,9,4,2,3],
                           'Death Year':[np.nan,3,5,np.nan,1,0],
                           'col':[7,8,9,4,2,3]})

print (mydata_csv)   
   Book of Death  Death Chapter  Death Year  col
0              4              7         NaN    7
1              5              8         3.0    8
2              4              9         5.0    9
3              5              4         NaN    4
4              5              2         1.0    2
5              4              3         0.0    3

mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], axis=1)
mydata_csv['Death Year'] = mydata_csv['Death Year'].notnull().astype(int)
print (mydata_csv)   
   Death Year  col
0           0    7
1           1    8
2           1    9
3           0    4
4           1    2
5           1    3

答案 3 :(得分:0)

见df.fillna()& df.replace()

getTable()

相关问题