我在R中有2个数据帧:' dfold' 175变量和' dfnew'有75个变量。 2个数据帧由主键(即“pid'”)匹配。 dfnew是dfold的一个子集,因此dfnew中的所有变量也都在dfold上,但具有更新的,估算的值(不再有NAs)。同时dfold有更多变量,我将在分析阶段需要它们。我想在dfmerge中合并2个数据帧,以便从dfnew更新公共变量 - > dfold但同时在dfold中保留预先存在的变量。我已经尝试过merge(),match(),dplyr和sqldf包,但要么我只获得一个带有更新的75个变量的dfmerge(左连接)或带有250个变量的dfmerge(带有NA的旧变量和没有它们的新变量共存) )。我找到的唯一方法(这里)是一个优雅但非常长(10行)的循环,在通过pid与all.x = TRUE选项合并后消除* .x变量)。如果可以的话,请你以更有效的方式获得这样的结果吗?
提前谢谢
P.S:为了方便起见,我创建了dfold和dfnew的最小版本:dfnew现在有3个变量,没有NAs,而dfold有5个变量,包括NAs。这是数据帧结构dfold:
structure(list(Country = structure(c(1L, 3L, 2L, 3L, 2L), .Label = c("France",
"Germany", "Spain"), class = "factor"), Age = c(44L, 27L, 30L,
38L, 40L), Salary = c(72000L, 48000L, 54000L, 61000L, NA), Purchased = structure(c(1L,
2L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"),
pid = 1:5), .Names = c("Country", "Age", "Salary", "Purchased",
"pid"), row.names = c(NA, 5L), class = "data.frame")
dfnew:
structure(list(Age = c(44, 27, 30), Salary = c(72000, 48000,
54000), pid = c(1, 2, 3)), .Names = c("Age", "Salary", "pid"), row.names = c(NA,
3L), class = "data.frame")
虽然这里的问题仅限于2个变量请提醒真实场景将涉及75个变量。
答案 0 :(得分:1)
好的,此解决方案假设您并不真正需要合并,但只想更新var Capture = function() {
html2canvas(document.body, {
onrendered: function (canvas) {
var tempcanvas=document.createElement('canvas');
tempcanvas.width=1050;
tempcanvas.height=1050;
var context=tempcanvas.getContext('2d');
context.drawImage(canvas,5,5);
var link=document.createElement("a");
link.href=tempcanvas.toDataURL('image/jpg'); //function blocks CORS
link.download = 'screenshot.jpg';
link.click();
}
});
中NA
的值dfold
中的估算值。
dfnew
要对单个列执行此操作,请尝试
> dfold
Country Age Salary Purchased pid
1 France NA 72000 No 1
2 Spain 27 48000 Yes 2
3 Germany 30 54000 No 3
4 Spain 38 61000 No 4
5 Germany 40 NA Yes 5
> dfnew
Age Salary pid
1 44 72000 1
2 27 48000 2
3 30 54000 3
4 38 61000 4
5 40 70000 5
在整个数据集上使用它有点棘手:
首先定义除dfold$Salary <- ifelse(is.na(dfold$Salary), dfnew$Salary[dfnew$pid == dfold$pid], dfold$Salary)
> dfold
Country Age Salary Purchased pid
1 France NA 72000 No 1
2 Spain 27 48000 Yes 2
3 Germany 30 54000 No 3
4 Spain 38 61000 No 4
5 Germany 40 70000 Yes 5
以外的所有常用字母:
pid
现在使用cols <- names(dfnew)[names(dfnew) != "pid"]
> cols
[1] "Age" "Salary"
将mapply
值替换为NA
:
ifelse
这假定dfold[,cols] <- mapply(function(x, y) ifelse(is.na(x), y[dfnew$pid == dfold$pid], x), dfold[,cols], dfnew[,cols])
> dfold
Country Age Salary Purchased pid
1 France 44 72000 No 1
2 Spain 27 48000 Yes 2
3 Germany 30 54000 No 3
4 Spain 38 61000 No 4
5 Germany 40 70000 Yes 5
仅包含dfnew
中包含的列。如果不是这种情况,请使用
dfold