Question

我正在尝试基于两列替换空值。基本上，我在一列中有公司代码，在第二列中有相应的值。我需要替换每个公司代码的值的平均值，而不是完整列的平均值。我怎么在R？（见下图）

Answer 1

你需要这样的东西：

df <- data.frame(Symbol = c("NXCDX", "ALX", "ALX", "BESOQ", "BESOQ", "BESOQ"), 
                Values = c(2345, 8654, NA, 6394, 8549, NA))

df %>% dplyr::group_by(Symbol) %>% dplyr::summarise(mean_values = mean(Values, na.rm = TRUE))

Answer 2

使用data.table

library(data.table)
setDT(df)[,replace(Values,is.na(Values),mean(Values,na.rm = T)),by=Symbol]

Answer 3

假设您的数据位于名为＆＃39; myData＆＃39;的数据框中。您可以继续使用ddply包中的plyr函数生成每个公司代码的平均值。 ddply函数将函数应用于由另一列分组的列。

library(plyr)

#Find the entries where the values are NULL, using "" (empty string) as NULL
#Can replace "" with whatever NULL is for you
nullMatches <- myData$Values == ""

#Generate the mean for each company
#This will return a 2 column data frame, first column will be "Symbol".
#Second column will the value of means for each 'Symbol'.
meanPerCompany <- ddply(myData[!nullMatches,], "Symbol", numcolwise(mean))

#Match the company symbol and store the mean
myData$Values[nullMatches] <- meanPerCompany[match(myData$Symbol[nullMatches], meanPerCompany[,1]),2]

如何根据两列中的值替换R中的na

3 个答案: