我想用每列新值填充缺失的分类单元格。例如:
c1 c2 c3
a nan a
b q nan
c d nan
a p z
应该变得像
c1 c2 c3
a n1 a
b q n2
c d n2
a p z
我目前的问题是我使用DictVectorizer作为分类列,但它保留了原样的NaN。
答案 0 :(得分:0)
带有一些uniq字符串的Fillna可以满足您的需求:
categorial_data = pd.DataFrame({'sex': ['male', 'female', 'male', 'female'],
'nationality': ['American', 'European', float('nan'), 'European']})
print(categorial_data)
categorial_data=categorial_data.fillna('some_unique_string')
print('after replacement')
print(categorial_data)
encoder = DV(sparse = False)
encoded_data = encoder.fit_transform(categorial_data.T.to_dict().values())
print(encoded_data)
给你
nationality sex
0 American male
1 European female
2 NaN male
3 European female
after replacement
nationality sex
0 American male
1 European female
2 some_unique_string male
3 European female
[[ 1. 0. 0. 0. 1.]
[ 0. 1. 0. 1. 0.]
[ 0. 0. 1. 0. 1.]
[ 0. 1. 0. 1. 0.]]