Question

我在R中有2个数据帧：＆＃39; dfold＆＃39; 175变量和＆＃39; dfnew＆＃39;有75个变量。 2个数据帧由主键（即“pid＆＃39;”）匹配。 dfnew是dfold的一个子集，因此dfnew中的所有变量也都在dfold上，但具有更新的，估算的值（不再有NAs）。同时dfold有更多变量，我将在分析阶段需要它们。我想在dfmerge中合并2个数据帧，以便从dfnew更新公共变量 - ＆gt; dfold但同时在dfold中保留预先存在的变量。我已经尝试过merge（），match（），dplyr和sqldf包，但要么我只获得一个带有更新的75个变量的dfmerge（左连接）或带有250个变量的dfmerge（带有NA的旧变量和没有它们的新变量共存））。我找到的唯一方法（这里）是一个优雅但非常长（10行）的循环，在通过pid与all.x = TRUE选项合并后消除* .x变量）。如果可以的话，请你以更有效的方式获得这样的结果吗？

提前谢谢

P.S：为了方便起见，我创建了dfold和dfnew的最小版本：dfnew现在有3个变量，没有NAs，而dfold有5个变量，包括NAs。这是数据帧结构

dfold：

structure(list(Country = structure(c(1L, 3L, 2L, 3L, 2L), .Label = c("France", 
"Germany", "Spain"), class = "factor"), Age = c(44L, 27L, 30L, 
38L, 40L), Salary = c(72000L, 48000L, 54000L, 61000L, NA), Purchased = structure(c(1L, 
2L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    pid = 1:5), .Names = c("Country", "Age", "Salary", "Purchased", 
"pid"), row.names = c(NA, 5L), class = "data.frame")

dfnew：

structure(list(Age = c(44, 27, 30), Salary = c(72000, 48000, 
54000), pid = c(1, 2, 3)), .Names = c("Age", "Salary", "pid"), row.names = c(NA, 
3L), class = "data.frame")

虽然这里的问题仅限于2个变量请提醒真实场景将涉及75个变量。

Answer 1

好的，此解决方案假设您并不真正需要合并，但只想更新var Capture = function() { html2canvas(document.body, { onrendered: function (canvas) { var tempcanvas=document.createElement('canvas'); tempcanvas.width=1050; tempcanvas.height=1050; var context=tempcanvas.getContext('2d'); context.drawImage(canvas,5,5); var link=document.createElement("a"); link.href=tempcanvas.toDataURL('image/jpg'); //function blocks CORS link.download = 'screenshot.jpg'; link.click(); } });中NA的值dfold中的估算值。

dfnew

要对单个列执行此操作，请尝试

> dfold
  Country Age Salary Purchased pid
1  France  NA  72000        No   1
2   Spain  27  48000       Yes   2
3 Germany  30  54000        No   3
4   Spain  38  61000        No   4
5 Germany  40     NA       Yes   5

> dfnew
  Age Salary pid
1  44  72000   1
2  27  48000   2
3  30  54000   3
4  38  61000   4
5  40  70000   5

在整个数据集上使用它有点棘手：

首先定义除dfold$Salary <- ifelse(is.na(dfold$Salary), dfnew$Salary[dfnew$pid == dfold$pid], dfold$Salary) > dfold Country Age Salary Purchased pid 1 France NA 72000 No 1 2 Spain 27 48000 Yes 2 3 Germany 30 54000 No 3 4 Spain 38 61000 No 4 5 Germany 40 70000 Yes 5以外的所有常用字母：

pid

现在使用cols <- names(dfnew)[names(dfnew) != "pid"] > cols [1] "Age" "Salary"将mapply值替换为NA：

ifelse

这假定dfold[,cols] <- mapply(function(x, y) ifelse(is.na(x), y[dfnew$pid == dfold$pid], x), dfold[,cols], dfnew[,cols]) > dfold Country Age Salary Purchased pid 1 France 44 72000 No 1 2 Spain 27 48000 Yes 2 3 Germany 30 54000 No 3 4 Spain 38 61000 No 4 5 Germany 40 70000 Yes 5仅包含dfnew中包含的列。如果不是这种情况，请使用

dfold

通过ID合并R中的两个Dataframe，One是另一个的子集

1 个答案: