我有2个数据帧:
CountryPoints
From.country To.Country points
Belgium Finland 4
Belgium Germany 5
Malta Italy 12
Malta UK 1
与邻国/边境国家的另一个数据框:
From.country To.Country
Belgium Finland
Belgium Germany
Malta Italy
我想在CountryPoints中添加另一个名为neighbor(Y / N)的列,具体取决于是否在邻居/边界国家/地区数据帧中找到了键值对。这是不可能的 - 所以它是一种连接,但结果应该是一个布尔列。
结果应为:
From.country To.Country points Neighbour
Belgium Finland 4 Y
Belgium Germany 5 Y
Malta Italy 12 Y
Malta UK 1 N
在下面的问题中,它显示了如何合并,但它没有显示如何添加额外的布尔列
答案 0 :(得分:3)
两种替代方法:
1)基础R:
idx <- match(df1$From.country, df2$From.country, nomatch = 0) &
match(df1$To.Country, df2$To.Country, nomatch = 0)
df1$Neighbour <- c('N','Y')[1 + idx]
2)data.table
:
library(data.table)
setDT(df1)
setDT(df2)
df1[, Neighbour := 'N'][df2, on = .(From.country, To.Country), Neighbour := 'Y'][]
两者都给出(data.table
- 输出显示):
From.country To.Country points Neighbour 1: Belgium Finland 4 Y 2: Belgium Germany 5 Y 3: Malta Italy 12 Y 4: Malta UK 1 N
答案 1 :(得分:2)
借鉴this post:
的想法df1$Neighbour <- duplicated(rbind(df2[, 1:2], df1[, 1:2]))[ -seq_len(nrow(df2)) ]
df1
# From.country To.Country points Neighbour
# 1 Belgium Finland 4 TRUE
# 2 Belgium Germany 5 TRUE
# 3 Malta Italy 12 TRUE
# 4 Malta UK 1 FALSE
答案 2 :(得分:0)
这样的事情怎么样?
sortpaste <- function(x) paste0(sort(x), collapse = "_");
df1$Neighbour <- apply(df1[, 1:2], 1, sortpaste) %in% apply(df2[, 1:2], 1, sortpaste)
# From.country To.Country points Neighbour
#1 Belgium Finland 4 TRUE
#2 Belgium Germany 5 TRUE
#3 Malta Italy 12 TRUE
#4 Malta UK 1 FALSE
df1 <- read.table(text =
"From.country To.Country points
Belgium Finland 4
Belgium Germany 5
Malta Italy 12
Malta UK 1", header = T)
df2 <- read.table(text =
"From.country To.Country
Belgium Finland
Belgium Germany
Malta Italy", header = T)