每行用随机值替换NaN

时间:2018-11-08 13:56:48

标签: python pandas data-cleaning

我有一个数据集,其列为“ Self_Employed”。这些列中的值为“是”,“否”和“ NaN”。我想用在calc()中计算出的值替换NaN值。我已经尝试了一些在这里找到的方法,但是找不到适合我的方法。 这是我的代码,我在注释中放入了我尝试过的内容。

    # Handling missing data - Self_employed
SEyes = (df['Self_Employed']=='Yes').sum()
SEno = (df['Self_Employed']=='No').sum()

def calc():
    rand_SE = randint(0,(SEno+SEyes))
    if rand_SE > 81:
        return 'No'
    else:
        return 'Yes'


> # df['Self_Employed'] = df['Self_Employed'].fillna(randint(0,100))
> #df['Self_Employed'].isnull().apply(lambda v: calc())
> 
> 
> # df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())  
> # df[df['Self_Employed']]
> 
> # df_nan['Self_Employed'] = df_nan['Self_Employed'].isnull().apply(lambda v: calc())
> # df_nan
> 
> #  for i in range(df['Self_Employed'].isnull().sum()):
> #      print(df.Self_Employed[i]


df[df['Self_Employed'].isnull()] = df[df['Self_Employed'].isnull()].apply(lambda v: calc())
df

现在我在df_nan上尝试过的那行似乎可行,但是后来我有了一个单独的集合,其中仅包含以前的缺失值,但是我想在整个数据集中填充缺失的值。对于最后一行,我遇到了错误,我链接到它的屏幕截图。 您了解我的问题吗?如果可以,您能帮忙吗?

This is the set with only the rows where Self_Employed is NaN

This is the original dataset

This is the error

3 个答案:

答案 0 :(得分:1)

确保SEno + SEyes!= null 使用.loc方法设置Self_Employed为空时的值

SEyes = (df['Self_Employed']=='Yes').sum() + 1
SEno = (df['Self_Employed']=='No').sum()

def calc():
    rand_SE = np.random.randint(0,(SEno+SEyes))
    if(rand_SE >= 81):
        return 'No'
    else:
        return 'Yes'

df.loc[df['Self_Employed'].isna(), 'Self_Employed'] = df.loc[df['Self_Employed'].isna(), 'Self_Employed'].apply(lambda x: calc())

答案 1 :(得分:0)

df['Self_Employed'] = df['Self_Employed'].fillna(calc())呢?

答案 2 :(得分:0)

您可以先确定NaN之类的位置

na_loc = df.index[df['Self_Employed'].isnull()]

计算您的列中NaN的数量,例如

num_nas = len(na_loc)

然后生成相应数量的随机数,可以很容易地对其进行索引和设置

fill_values = pd.DataFrame({'Self_Employed': [random.randint(0,100) for i in range(num_nas)]}, index = na_loc)

最后在数据框中替换这些值

df.loc[na_loc]['Self_Employed'] = fill_values