Python:填充' na'在pandas列中,列表中包含随机元素

时间:2017-11-26 15:00:32

标签: python pandas

我正在努力填补“NA'在pandas列中随机选择列表中的元素。

例如:

import pandas as pd
df = pandas.DataFrame()
df['A'] = [1, 2, None, 5, 53, None]
fill_list = [22, 56, 84]

是否可以编写一个以列名作为输入的pandas DF的函数,并通过从列表中随机选择元素来替换所有NA' fill_list'?

fun(df['column_name'], fill_list])

2 个答案:

答案 0 :(得分:5)

使用numpy.random.choice创建新的Series,然后将NaN替换为fillnacombine_first

df['A'] = df['A'].fillna(pd.Series(np.random.choice(fill_list, size=len(df.index))))
#alternative
#df['A'] = df['A'].combine_first(pd.Series(np.random.choice(fill_list, size=len(df.index))))
print (df)
      A
0   1.0
1   2.0
2  84.0
3   5.0
4  53.0
5  56.0

或者:

#get mask of NaNs
m = df['A'].isnull()
#count rows with NaNs
l = m.sum()
#create array with size l
s = np.random.choice(fill_list, size=l)
#set NaNs values
df.loc[m, 'A'] = s
print (df)
      A
0   1.0
1   2.0
2  56.0
3   5.0
4  53.0
5  56.0

答案 1 :(得分:0)

data_rnr['CO BORROWER NAME'].fillna("NO",inplace=True)
data_rnr['ET REASON'].fillna("ET REASON NOT AVAILABLE",inplace=True)
data_rnr['INSURANCE COMPANY NM'].fillna("INSURANCE COMPANY-NOT 
   AVAILABLE",inplace=True)
data_rnr['GENDER'].fillna("GENDER DATA- NOT AVAILABLE",inplace=True)