我试图在特定于数据框的列中插入缺失值。
我的目的是通过其他专栏小组替换它。
我已使用aggregate
保存汇总结果:
# Replace LotFrontage missing values by Neighborhood mean
lot_frontage_by_neighborhood = aggregate(LotFrontage ~ Neighborhood, combined, mean)
现在我想实现这样的事情:
for key, group in lot_frontage_by_neighborhood:
idx = (combined["Neighborhood"] == key) & (combined["LotFrontage"].isnull())
combined[idx, "LotFrontage"] = group.median()
这当然是一个python代码。
不确定如何在R中实现这一点,有人可以帮忙吗?
例如:
Neighborhood LotFrontage
A 20
A 30
B 20
B 50
A <NA>
NA记录应替换为25(邻域A中所有记录的平均LotFrontage)
由于
答案 0 :(得分:1)
这是你想要的想法吗?您可能需要which()函数来确定哪些行具有NA值。
set.seed(1)
Neighborhood = sample(letters[1:4], 10, TRUE)
LotFrontage = rnorm(10,0,1)
LotFrontage[sample(10, 2)] = NA
# This data frame has 2 columns. LotFrontage column has 10 missing values.
df = data.frame(Neighborhood = Neighborhood, LotFrontage = LotFrontage)
# Sets the missing values in the Neighborhood column to the mean of the LotFrontage values from the rows with that Neighborhood
x<-df[which(is.na(df$LotFrontage)),]$Neighborhood
f<-function(x) mean(df[(df$Neighborhood==x),]$LotFrontage, na.rm =TRUE)
df[which(is.na(df$LotFrontage)),]$LotFrontage <- lapply(x,f)