如何比较两个不同DF中的两列,并在另一列中增加价值?

时间:2019-09-19 13:13:33

标签: r dataframe join

我有两个不同的DF(caracteristica_receitacoop_receita_anos2d)。我需要比较它们的两列(CNPJ和ANO)。如果它们匹配,则需要在新列(caracteristica_receita$benford)中添加“ 1”。

我一直在使用

caracteristica_receita$benford[which(caracteristica_receita$CNPJ %>%
                                       is.element(coop_receita_anos2d$CNPJ))] <- 1 

但是我不知道如何在两列中使用它。

caracteristica_receita <- structure(list(CNPJ = c(1234, 5678, 91012, 12346, 96385, 87952, 
7789, 2535, 4459, 5457), NOME_INSTITUICAO = c("XXXX", 
"AAAA", "BBBB", "CCCC", "DDDDD", 
"RRRR", "FFFFF", 
"GGGGG", "HHHHHH", 
"IIIIIII"), ano_fundacao = c(1993, 
1993, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994), ANO = c(2014, 
2015, 2014, 2015, 2016, 2014, 2014, 2015, 2016, 2017), benford = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("CNPJ", "NOME_INSTITUICAO", 
"ano_fundacao", "ANO", "benford"), row.names = c(NA, 10L), class = "data.frame")

coop_receita_anos2d <- structure(list(CNPJ = c(1234, 5678, 916862, 12346, 96385, 87952, 
7789, 2535, 4459, 46868), ANO = c(2014, 2014, 0, 0, 0, 2014, 
0, 0, 0, 0)), .Names = c("CNPJ", 
"ANO"), row.names = c(1L, 3L, 
7L, 11L, 15L, 19L, 23L, 27L, 31L, 35L), class = "data.frame")

所以,我想要

structure(list(CNPJ = c(1234, 5678, 91012, 12346, 96385, 87952, 
7789, 2535, 4459, 5457), NOME_INSTITUICAO = c("XXXX", 
"AAAA", "BBBB", "CCCC", "DDDDD", 
"RRRR", "FFFFF", 
"GGGGG", "HHHHHH", 
"IIIIIII"), ano_fundacao = c(1993, 
1993, 1994, 1994, 1994, 1994, 1994, 1994, 1994, 1994), ANO = c(2014, 
2015, 2014, 2015, 2016, 2017, 2014, 2015, 2016, 2017), benford = c(1, 0, 
    0, 0, 0, 1, 0, 0, 0, 0)), .Names = c("CNPJ", "NOME_INSTITUICAO", 
"ano_fundacao", "ANO", "benford"), row.names = c(NA, 10L), class = "data.frame")

3 个答案:

答案 0 :(得分:0)

您可以将两列粘贴在一起,并使用match。转换为布尔值,然后转换为整数,如下所示,

as.integer(!is.na(match(do.call(paste, caracteristica_receita[c('CNPJ', 'ANO')]), 
                        do.call(paste, coop_receita_anos2d))))

#[1] 1 0 0 0 0 1 0 0 0 0

或将其分配回您的数据框,

caracteristica_receita$benford <- as.integer(!is.na(....))

答案 1 :(得分:0)

简单的基础R解决方案(假设dfdf2的记录数相同):

df <- caracteristica_receita
df2 <- coop_receita_anos2d
ind <- df$ANO == df2$ANO & df$CNPJ == df2$CNPJ
df$benford <- ifelse(ind, 1, 0)

答案 2 :(得分:0)

谢谢你们!工作了!

此外,我的朋友也发送了此答案:

caracteristica_receita$benford[which(str_c(caracteristica_receita$CNPJ, caracteristica_receita$ANO) %>% 
                                    is.element(str_c(coop_receita_anos2d$CNPJ, coop_receita_anos2d$ANO)))] <- 1 

非常感谢您!