这是我的数据,包括infoemployer,inforst和interrst。这叫做tyearb。
infoemployer inforst
1 Comcast Jeff Dunn
6 Cummins, Inc. Rebekah Smith
38 DaVita Andy Nielsen
42 Deloitte Chase Russell
66 Duff & Phelps LLC Tanner Anderson
76 Frito-Lay Inc. Tanner Anderson
88 Intel Corporation Jake Graff
96 J.P. Morgan- (J.P. Morgan is part of JPMorgan Chase & Co) Andy Nielsen
97 Lenovo Nelson Anievas
98 PepsiCo Tanner Anderson
100 Procter & Gamble Andee Flinders
102 Sears Holdings Corporation, formerly Sears, Roebuck & Company Tanner Anderson
103 The Walt Disney Company Kylie Rothlisberger
106 Union Pacific Railroad Jake Graff
116 USAA Rebekah Smith
117 Walmart Chase Russell
237 <NA>
238 Apple <NA>
239 Brandes Investment Partners L.P. <NA>
240 EY (formerly known as Ernst & Young) LLP <NA>
242 Grant Thornton LLP <NA>
243 KPMG LLP <NA>
245 Moss Adams <NA>
246 Pariveda Solutions <NA>
248 PwC (PricewaterhouseCoopers, LLC) <NA>
250 RCLCO <NA>
251 Strata Fund Services, LLC <NA>
interrst
1 <NA>
6 Rebekah Smith
38 Andy Nielsen
42 Chase Russell
66 Tanner Anderson
76 Tanner Anderson
88 Jake Graff
96 Andy Nielsen
97 Nelson Anievas
98 Tanner Anderson
100 Andee Flinders
102 Tanner Anderson
103 Kylie Rothlisberger
106 Jake Graff
116 Rebekah Smith
117 Chase Russell
237 Austin Pollard
238 Brady Tengberg
239 Jeff Dunn
240 Rebekah Smith
242 Jeff Dunn
243 Andee Flinders
245 Jake Graff
246 Nelson Anievas
248 Nelson Anievas
250 Jake Graff
251 Andy Nielsen
我的代码如下:
levels(tyearb[,2]) <- c(levels(tyearb[,2]), levels(tyearb[,3]))
for (i in 1:length(tyearb))
{
if (is.na(tyearb[i,2]))
{
tyearb[i,2] = tyearb[i,3]
}
}
我只想保留inforst中的所有当前值,除非它是<NA>
,然后我想插入interrst的值。我认识到我可以将除了第一个中间值以外的所有值复制到inforst,但是我显然无法使用更大的数据集来执行此操作,其中将丢失更多信息。
我看了很多,如果循环在一起,我就是不能让它为我工作。有人可以解释一下吗?
答案 0 :(得分:2)
data.table解决方案(即使非常大的数据集也会非常快):
library(data.table)
DT[is.na(z), z := y]
其中z
是您要为NA
测试的列,而y
是您要插入的列(尽管您可以使用任何表达式替换y
)。