Question

我有一个数据框，其中的情况在行上重复。一些行比其他行具有更完整的数据。我想对案例进行分组，然后将第一个非缺失值分配给该组在该列中的所有NA单元。这似乎很简单，但是我被困住了。我有有效的语法，但是当我尝试使用apply将代码应用于数据帧中的所有列时，我得到的是列表而不是数据帧。使用do.call（rbind）或rbindlist或unlist也不能完全解决问题。

这是语法。

df$groupid<-group_indices (df,id1,id2) #creates group id on the basis of a combination of two variables

df%<>%group_by(id1,id2) #actually groups the dataframe according to these variables

df<-summarise(df, xvar1=xvar1[which(!is.na(xvar1))[1]]) #this code works great to assign the first non missing value to all missing values but it only works on 1 column at a time (X1).

我有很多专栏，所以我尝试使用Apply使其成为可管理的任务。

df<-apply(df, MARGIN=2, FUN=function(x) {summarise(df, x=x[which(!is.na(x))[1]])
  }
)

这为我提供了每个变量的列表，我想要一个数据框（然后将其删除重复数据）。我尝试了rbindlist和do.call（rbind），结果导致只有3列的长数据框-两个group_by变量和'x'。

我知道问题只是我如何使用apply，可能是带有“哪个”的索引，但是我很困惑。

Answer 1

如何将lapply与do.call和cbind一起使用，如下所示：

df <- do.call(cbind, lapply(df, function(x) {summarise(df, x=x[which(!is.na(x))[1]])}))

应用结果列表而不是数据框

1 个答案: