{'Country': 'USA', 'Age': '52', 'Sal': '12345', 'OnWork': 'No'}
{'Country': 'UK', 'Age': '23', 'Sal': '1142', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '4456', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '25', 'Sal': '4456', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '?', 'Sal': '2345', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '3342', 'OnWork': 'Yes'}
{'Country': 'MAL', 'Age': '25', 'Sal': '3452', 'OnWork': 'No'}
{'Country': 'MAL', 'Age': '?', 'Sal': '3562', 'OnWork': 'No'}
在这里,我必须根据“ OnWork”值替换丢失的平均值。组是,其平均值进入第5行年龄。组编号及其值应转到最后一行。
df = pd.read_csv("Mycal.csv", na_values = missing_values, nrows=50)
df["F8"].fillna(df['F8'].mean(), inplace=True)
df[df["Class"]=="Yes"]["F8"].mean()
我希望“是”值可以分组并填充“缺失”值,而“平均值”可以填充“否”。请帮助我
答案 0 :(得分:1)
df['Age'] = df['Age'].mask(df['Age'].eq('?'), np.nan).astype(float)
df['Age'] = (df['Age'].fillna(df.groupby('OnWork')['Age'].transform(np.nanmean))
.astype(int))
print(df)
Country Age Sal OnWork
0 USA 52 12345 No
1 UK 23 1142 Yes
2 MAL 25 4456 No
3 MAL 25 4456 No
4 MAL 24 2345 Yes
5 MAL 25 3342 Yes
6 MAL 25 3452 No
7 MAL 31 3562 No
如果要一次替换多个列值,请使用:
df = df.fillna(df.groupby('OnWork').transform('mean'))
答案 1 :(得分:0)
如果您要平均替换每个组的缺失值,那么这是解决方法之一:
df_mean = df.groupby('Class')['F8'].mean().reset_index()
df_mean.columns = ['Class','F8_mean']
df = pd.merge(df, df_mean, on='Class', how='left')
df.loc[df['F8'].isnull(), 'F8'] = df['F8_mean']
df.drop('F8_mean', axis=1, inplace=True)
答案 2 :(得分:0)
#import libries
import pandas as pd
import numpy as np
# Data dictionary
data_dict = {'Country': ['USA','UK','MAL','MAL','MAL','MAL','MAL','MAL'],
'Age': ['52','23','25','25','?','25','25','?'], 'Sal': ['12345','1142','4456','4456','2345','3342','3452','3562'],
'OnWork': ['No','Yes','No','No','Yes','Yes','No','No']}
# Convert dictionary to dataframe
df = pd.DataFrame(data_dict)
# print input df
print(df)
**
Country Age Sal OnWork
0 USA 52 12345 No
1 UK 23 1142 Yes
2 MAL 25 4456 No
3 MAL 25 4456 No
4 MAL ? 2345 Yes
5 MAL 25 3342 Yes
6 MAL 25 3452 No
7 MAL ? 3562 No
**
# '?' Values replace with NaN
df.Age=df.Age.where(df.Age!='?')
# Convert string values to numeric
df["Age"] = pd.to_numeric(df["Age"])
# Get mean values Separately
mean_list = df.groupby('OnWork')['Age'].mean().astype(int)
# print mean values
print(mean_list)
**
No 31
Yes 24
**
# Replace the missing age value
df['Age'] = df.apply(
lambda row: mean_list['Yes'] if np.isnan(row['Age'])&(row['OnWork']=='Yes') else mean_list['No'] if np.isnan(row['Age'])&(row['OnWork']=='No') else row['Age'],
axis=1
)
# print final df
print(df)
**
Country Age Sal OnWork
0 USA 52.0 12345 No
1 UK 23.0 1142 Yes
2 MAL 25.0 4456 No
3 MAL 25.0 4456 No
4 MAL 24.0 2345 Yes
5 MAL 25.0 3342 Yes
6 MAL 25.0 3452 No
7 MAL 31.0 3562 No
**