Stata会自动创建一个名为" _merge"的变量。在合并后指示两个数据集中的匹配变量。有没有办法让R&#39的merge()函数生成这样的变量?
答案 0 :(得分:4)
<input class="input" type="checkbox" value="1" name="select-pot[]">
<select id="select" name="q-count[]">
<option disabled selected>-- choose --</option>
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
<option value="6">6</option>
</select>
中_merge
的可能值为(注意Stata
也可以包含值4和5)
merge
在 1 master observation appeared in master only
2 using observation appeared in using only
3 match observation appeared in both
中,您可以将参数输入R
或all=TRUE
或all.x=TRUE
例如,
all.y=TRUE
答案 1 :(得分:0)
我已根据@Metrics答案编写了以下函数。它在结果数据集中创建一个变量“merge”,表示Stata的观察结果。
stata.merge <- function(x,y, by = intersect(names(x), names(y))){
x[is.na(x)] <- Inf
y[is.na(y)] <- Inf
matched <- merge(x, y, by.x = by, by.y = by, all = TRUE)
matched <- matched[complete.cases(matched),]
matched$merge <- "matched"
master <- merge(x, y, by.x = by, by.y = by, all.x = TRUE)
master <- master[!complete.cases(master),]
master$merge <- "master"
using <- merge(x, y, by.x = by, by.y = by, all.y = TRUE)
using <- using[!complete.cases(using),]
using$merge <- "using"
df <- rbind(matched, master,using)
df[sapply(df, is.infinite)] <- NA
df
}
测试。
df1 <- data.frame(id = letters[c(1:5,8:9)], v1=c(1:5,8:9))
df1
id v1
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 h 8
7 i 9
df2 <- data.frame(id = letters[1:8], v1=c(1:7,NA))
df2
id v1
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
8 h NA
stata.merge(df1,df2, by = "id")
id v1.x v1.y merge
1 a 1 1 matched
2 b 2 2 matched
3 c 3 3 matched
4 d 4 4 matched
5 e 5 5 matched
6 h 8 NA matched
7 i 9 NA master
71 f NA 6 using
8 g NA 7 using
答案 2 :(得分:0)
这是(我认为)前一个人的stata.merge函数的一个更简单,更有效的版本。这假设您没有名为&#34; new1&#34;的变量。或&#34; new2&#34;在您的数据框中。如果此假设错误,请更改此函数中的变量名称。该函数采用3个变量,第一个数据帧,第二个数据帧,以及输入&#34; by =&#34;的值。合并功能的一部分。
stata.merge <- function(x,y, name){
x$new1 <- 1
y$new2 <- 2
df <- merge(x,y, by = name, all = TRUE)
df$stat.merge.variable <- rowSums(df[,c("new1", "new2")], na.rm=TRUE)
df$new1 <- NULL
df$new2<- NULL
df
}