Question

我实际上甚至不确定如何提出这个问题，所以请耐心等待。

我注意到我正在使用的数据集中存在错误，即ANES累积文件。对于数据集（2004年）中的其中一年，一个变量（我改名为“growup”）的值被偶然排除，所以它只是说那个年份的“NA”。这些值存在于其他年份，因此数据集基本上看起来像这样：

id   year   grewup
1    2002   127
2    2002   310
3    2004   NA
4    2004   NA
5    2008   332
6    2008   614

我确实有另一个仅包含2004年的数据集，并且具有“growup”的缺失值。我想要做的是使用第二个数据集中的值重新编码2004年的NA。我该怎么办？同样，这些值在其余年份的累积数据集中;我只想重新编码2004年，并留下其余的价值观。

感谢。

一些澄清和补充：

我想只从第二个数据集中提取这一个变量，以避免使第一个数据集比现在更加庞大和耗尽内存（951列）。实际上有许多其他变量我们已经拥有
此外，虽然2004年的所有值都是NA，但并非数据集中的每个NA都是2004年。其他年份中有一些是合法缺失的值。

Answer 1

您应该能够按ID和年合并这些数据框：

 merge(dat1,dat2,by=c("id", "year"),all.x=TRUE)  # and "outer join"
  id year grewup.x grewup.y
1  1 2002      127       NA
2  2 2002      310       NA
3  3 2004       NA      438
4  4 2004       NA      834
5  5 2008      332       NA
6  6 2008      614       NA
 datm <- merge(dat1,dat2,by=c("id", "year"),all.x=TRUE)

 # No "fill in the blanks
 datm[is.na(datm$grewup.x), "grewup.x"] <- datm[is.na(datm$grewup.x), "grewup.y"] 
 # Notice that the logical index is the same on both sides of the assignment

 datm[ ! names(datm) %in% 'grewup.y' ]  # drop the supplementary column

  id year grewup.x
1  1 2002      127
2  2 2002      310
3  3 2004      438
4  4 2004      834
5  5 2008      332
6  6 2008      614

R：通过合并另一个数据集中的值来对变量进行部分重新编码

1 个答案: