为了使计算机工作,我删除了“城市”列,并创建了一个名为“ data_numberOnly”的新数据框。拟合和变换后,我需要添加“城市”列。如何添加此列?
代码
import numpy as np
import pandas as pd
from numpy import nan
from sklearn.impute import SimpleImputer
columns = ['Population','PerCapita_Income','City']
p = np.array([[1,2.0, 'Atlanta'],[4,np.nan, 'Phoenix'],(1,3.,'Raleigh')])
#Create data frame from array
df3 = pd.DataFrame(p)
df3.columns = columns
#drop non-numeric columns for imputer to work
data_numberOnly = df3.drop('City', axis=1)
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit(data_numberOnly)
X = imp.transform(data_numberOnly)
X
源数据框
转换后
答案 0 :(得分:1)
您可以使用np.hstack
X = np.hstack([X,df3['City'][:,None]])
您需要[:,None]
部分将pd.Series从1D转换为具有1列的2D阵列(也有类似的技巧)
X
array([[1.0, 2.0, 'Atlanta'],
[4.0, 2.5, 'Phoenix'],
[1.0, 3.0, 'Raleigh']], dtype=object)
答案 1 :(得分:1)
您可以通过选择目标列将其替换回df3
。
df3[['Population','PerCapita_Income']] = X
df3
Population PerCapita_Income City
0 1 2 Atlanta
1 4 2.5 Phoenix
2 1 3 Raleigh
或创建一个新的
df = pd.DataFrame(X)
df['City'] = df3['City'].copy()
df.columns = columns
df
Population PerCapita_Income City
0 1.0 2.0 Atlanta
1 4.0 2.5 Phoenix
2 1.0 3.0 Raleigh