所以我需要合并三个数据集。这些包含4年级和5年级的学校数据和读/数学分数。其中一个是长形式数据集,在某些变量中有很多缺失(是的,我确实需要长形式的数据)而另外两个有广泛的缺失数据。所有这些数据框都包含一个列,该列具有数据库中每个个体的唯一ID号。
这是一个完全可重现的示例,它生成我正在使用的data.frames类型的一个小例子......我需要使用的三个数据框如下:school_lf
,{{1 }和school4
。 school5
包含带有NAs的长格式数据,school_lf
和school4
是我需要用来填充此长格式数据中的NA的dfs(school5
和{{ 1}})
id
我需要将宽格式数据合并到长格式数据中,以用实际值替换NA。我已经尝试了下面的代码,但它引入了几个列而不是合并读取分数和数学分数,其中有NA。我只需要一个包含读取分数的列和一个包含数学分数的列,而不是六个单独的列(grade
,set.seed(890)
school <- NULL
school$id <-sample(102938:999999, 100)
school$selected <-sample(0:1, 100, replace = T)
school$math4 <- sample(400:500, 100)
school$math5 <- sample(400:500, 100)
school$read4 <- sample(400:500, 100)
school$read5 <- sample(400:500, 100)
school <- as.data.frame(school)
# Delete observations at random from the school df
indm4 <- which(school$math4 %in% sample(school$math4, 25))
school$math4[indm4] <- NA
indm5 <- which(school$math5 %in% sample(school$math5, 50))
school$math5[indm5] <- NA
indr4 <- which(school$read4 %in% sample(school$read4, 70))
school$read4[indr4] <- NA
indr5 <- which(school$read5 %in% sample(school$read5, 81))
school$read5[indr5] <- NA
# Separate Read and Math
read <- as.data.frame(subset(school, select = -c(math4, math5)))
math <- as.data.frame(subset(school, select = -c(read4, read5)))
# Now turn this into long form data...
clr <- melt(read, id.vars = c("id", "selected"), variable.name = "variable", value.name = "readscore")
clm <- melt(math, id.vars = c("id", "selected"), value.name = "mathscore")
# Clean up the grades for each of these...
clr$grade <- ifelse(clr$variable == "read4", 4,
ifelse(clr$variable == "read5", 5, NA))
clm$grade <- ifelse(clm$variable == "math4", 4,
ifelse(clm$variable == "math5", 5, NA))
# Put all these in one df
school_lf <-cbind(clm, clr$readscore)
school_lf$readscore <- school_lf$`clr$readscore` # renames
school_lf$`clr$readscore` <- NULL # deletes
school_lf$variable <- NULL # deletes
###############
# Generate the 2 data frames with IDs that have the full data
set.seed(890)
school4 <- NULL
school4$id <-sample(102938:999999, 100)
school4$selected <-sample(0:1, 100, replace = T)
school4$math4 <- sample(400:500, 100)
school4$read4 <- sample(400:500, 100)
school4$grade <- 4
school4 <- as.data.frame(school4)
set.seed(890)
school5 <- NULL
school5$id <-sample(102938:999999, 100)
school5$selected <-sample(0:1, 100, replace = T)
school5$math5 <- sample(400:500, 100)
school5$read5 <- sample(400:500, 100)
school5$grade <- 5
school5 <- as.data.frame(school5)
,read.x
,read.y
,{{1}和math.x
)。
math.y
非常感谢任何帮助!我一直试图解决这个问题几个小时,并没有取得任何进展(所以我想在这里问一下)
答案 0 :(得分:0)
您可以使用coalesce
中的dplyr
功能。如果第一个向量中的值是NA,它将看到第二个向量中相同位置的值是否不是NA并选择它。如果再次NA,则转到第三个。
library(dplyr)
sch %>% mutate(mathscore = coalesce(mathscore, math4, math5)) %>%
mutate(readscore = coalesce(readscore, read4, read5)) %>%
select(id:readscore)
答案 1 :(得分:0)
Dim idx = 0
For Each tb In TabControl1.Controls.OfType(Of TabPage)()
For Each pnl In tb.Controls.OfType(Of Panel)().OrderBy(Function(c) c.TabIndex)
For Each cb In pnl.Controls.OfType(Of CheckBox)()
cb.Checked = tabel1(idx) = 1
idx += 1
Next
Next
Next
的dfs具有不同的数字行...回到原点。
我能够通过以下代码解决这个问题(尽管它不是最优雅或最直接的,而且@ Edwin的回应帮助我指明了正确的方向。有关如何做出任何建议使这个代码更加优雅和高效是非常受欢迎的!
coalesce