无法使用系列中的值填充数据框中的列

时间:2018-01-05 01:24:54

标签: python pandas dataframe

我试图在数据框的特定列中填充相同类型的非空值的平均值(基于数据框中另一列的值)。 以下是重现我的问题的代码:

import numpy as np
import pandas as pd

df = pd.DataFrame()
#Create the DateFrame with a column of floats
#And a column of labels (str)
np.random.seed(seed=6)
df['col0']=np.random.randn(100)    
lett=['a','b','c','d']
df['col1']=np.random.choice(lett,100)

#Set some of the floats to NaN for the test.
toz = np.random.randint(0,100,25)
df.loc[toz,'col0']=np.NaN
df[df['col0'].isnull()==False].count()

#Create a DF with mean for each label.
w_series = df.loc[(~df['col0'].isnull())].groupby('col1').mean()



        col0
col1    
a   0.057199
b   0.363899
c   -0.068074
d   0.251979

#This dataframe has our label (a,b,c,d) as the index. Doesn't seem
#to work when I try to df.fillna(w_series). So I try to reindex such
#that the labels (a,b,c,d) become a column again.
#
#For some reason I cannot just do a set_index and expect the
#old index to become column. So I append the new index and 
#then reset it.
w_series['col2'] = list(range(w_series.size))
w_frame = w_series.set_index('col2',append=True)
w_frame.reset_index('col1',inplace=True)

#I try fillna() with the new dataframe.
df.fillna(w_frame)

仍然没有运气:

        col0    col1
0   0.057199    b
1   0.729004    a
2   0.217821    d
3   0.251979    c
4   -2.486781   a
5   0.913252    b
6   NaN         a
7   NaN         b

我做错了什么?

如何使用与缺失信息匹配的特定行的平均值来填充数据框?

数据框的大小(df)和填充数据框(w_frame)是否必须匹配?

谢谢

1 个答案:

答案 0 :(得分:0)

$settings基于索引,因此,您需要为目标数据框和流程数据框提供相同的索引

fillna

我的df.set_index('col1')['col0'].fillna(w_frame.set_index('col1').col0).reset_index() # I only show the first 11 row Out[74]: col1 col0 0 b 0.363899 1 a 0.729004 2 d 0.217821 3 c -0.068074 4 a -2.486781 5 b 0.913252 6 a 0.057199 7 b 0.363899 8 c -0.068074 9 b -0.429894 10 a 2.631281

fillna