根据与第二个表

时间:2016-08-10 17:11:31

标签: r match

我有两张表selectVDem和DT。

DT的列标有国家,月 - 年和年

       Country MonthofDate Size Year
   1:    Benin  1997-01-01   18 1997
   2:    Benin  1997-02-01   18 1997
   3:    Benin  1997-03-01   18 1997
   4:    Benin  1997-04-01   18 1997
   5:    Benin  1997-05-01   18 1997
  ---                               
3506: Zimbabwe  2015-07-01   38 2015
3507: Zimbabwe  2015-08-01   38 2015
3508: Zimbabwe  2015-09-01   42 2015
3509: Zimbabwe  2015-10-01   42 2015
3510: Zimbabwe  2015-11-01   42 2015

而selectVDem具有以下标题

Year Country EqualityResources EqualityProtec PercentSufferage   LocalGov  RegionGov ExecCorrupt PolCorrupt

我想将EqualityResources EqualityProtec PercentSufferage LocalGov RegionGov ExecCorrupt PolCorrupt值附加到DT表的末尾,作为基于年份和国家/地区值匹配的新列。有没有办法在不使用for循环的情况下做到这一点?我已经尝试了两种方法。

DT$EqualityResources <- subset(DT$Country == selectVDem$Country & DT$Year == `selectVDem$Year, select =  EqualityResources)`

这会导致错误

Error in subset.default(DT$Country == selectVDem$Country & DT$Year ==  : 
  argument "subset" is missing, with no default
In addition: Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(DT$Country, selectVDem$Country) :
  longer object length is not a multiple of shorter object length
3: In DT$Year == selectVDem$Year :
  longer object length is not a multiple of shorter object length

我也尝试过写一个函数并使用apply函数

getVDem <- function(vDemVal, country, year, vDem){
  result <- vDem[vDem$Country == country & vDem$Year == year,]
  finalResult <- vDem$vDemVal
  return(finalResult)
}

DT$EqualityResources <- apply(DT, 1, getVDem(selectVDem, DT$Country, `DT$Year,'EqualityResources'))#subset(selectVDem,DT$Country == Country & DT$Year == Year, select = EqualityResources)`

给了我错误

  

vDem $ Country出错:$ operator对原子向量无效

我该怎么办?

1 个答案:

答案 0 :(得分:0)

尝试按列进行操作可能会造成混乱。您可以使用dplyr的连接功能组,很可能是left_join。具有匹配的列名称可以使连接自动确定&#34; by&#34;参数,但是如果您有匹配的列名和不同的列内容,请小心!

library(dplyr)
newDT <- left_join(DT, selectVDem)

left_join合并的优点是您不包含左对象中不存在的任何类别(在您的情况下为年份或国家/地区)。自动化by参数也是一个优势。