Question

如何使用另一个数据框中的值来对一个数据框中的缺失值进行子集？

我们说我有两个数据集：

数据集1显示每个国家每天产生的食物量。

 country         day     tonnes of food
## 1       china  1          6
## 2       china  1          NA
## 3       china  2          2
## 4       china  2          NA

dataset2是白天的平均食物量

country         day     average tonnes of food
## 1       china  1          6
## 3       china  2          2

如何使用dataset2的平均值填充dataset1的NA。

即。如果is.na(dataset1$tonnes)为真，请填写dataset2$averagetonnes

中的平均值

Answer 1

我们可以在data.table

中使用加入

library(data.table)
setDT(df1)[df2, on =c("country", "day")][is.na(tonnes.of.food), 
  tonnes.of.food:= average.tonnes.of.food][, average.tonnes.of.food:=NULL][]
#   country day tonnes.of.food
#1:   china   1              6
#2:   china   1              6
#3:   china   2              2
#4:   china   2              2

Answer 2

如果我理解你正确使用match功能将解决您的问题。 数据：

df1 <- data.frame(country=c(rep('china1',2),rep('china2',2)),day=c(1,1,2,2),tof = c(6,NA,2,NA),stringsAsFactors = F) df2 <- data.frame(country=c('china1','china2'),day=c(1,2),atof = c(6,2),stringsAsFactors = F) df1 country day tof #1 china1 1 6 #2 china1 1 NA #3 china2 2 2 #4 china2 2 NA

此行将使用第二个data.frame df2的相应国家/地区的平均值替换NAs。 match函数生成匹配位置的向量，[which(is.na(df1$tof))]选择“tof”列中有NA的索引。

df1$tof[is.na(df1$tof)] <- df2$atof[match(df1$country,df2$country)][which(is.na(df1$tof))] df1 country day tof #1 china1 1 6 #2 china1 1 6 #3 china2 2 2 #4 china2 2 2

使用R中另一个数据帧的值填写缺失值（NAs）

2 个答案: