根据另一列创建具有差异的列

时间:2014-11-04 15:13:11

标签: r

我的数据框看起来像这样(从699条条款中简化):

TRT <- data.frame(T.ID=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8),
              Treaty=c("hungary slovenia 1994", "hungary slovenia 1994",
                                "nicaragua taiwan 2006", "nicaragua taiwan 2006",
                                "ukraine uzbekistan 1994", "ukraine uzbekistan 1994",
                                "brazil uruguay 1986", "brazil uruguay 1986",
                                "albania macedonia 2002", "albania macedonia 2002",
                                "albania moldova 2003", "albania moldova 2003",
                                "albania romania 2003", "albania romania 2003",
                                "Treaty of Izmir 1977","Treaty of Izmir 1977",
                                "Treaty of Izmir 1977"),
              sc.y=c("HUN1994", "SLV1994", "NIC2006", "TAW2006", "UKR1994", 
                     "UZB1994", "BRA1986", "URU1986", "ALB2002", "MAC2002", 
                     "ALB2003", "MLD2003", "ALB2003", "RUM2003", "IRN1977", 
                     "TUR1977", "PAK1977"),
              prom.demo=c(1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0),
              polity=c(10,10,8,10,7,-9,7,9,7,9,7,8,7,8,-10,-7,9))

最后,我希望有一个数据框,只列出每个条约一次,它的“prom.demo”列的值和一个包含“polity”中最大值和最小值之差的列 - 每个条约的缔约方价值(大多数条约只有两个缔约方,但有些缔约方最多有51个)。 是否有任何R命令可以让我获得699次计算?

1 个答案:

答案 0 :(得分:1)

dplyrscode上使用year加入,然后按Treaty进行分组,然后计算出最小值和最大值polity之间的差异:

require(dplyr)
left_join(treaties, Polity, c("scode","year")) %>% group_by(Treaty) %>% summarise(PolityDiff=max(polity,na.rm=TRUE)-min(polity,na.rm=TRUE))
Source: local data frame [8 x 2]

                   Treaty PolityDiff
1  albania macedonia 2002          2
2    albania moldova 2003          1
3    albania romania 2003          1
4     brazil uruguay 1986          2
5   hungary slovenia 1994          0
6   nicaragua taiwan 2006          2
7    Treaty of Izmir 1977         NA
8 ukraine uzbekistan 1994         16

NA是你没有任何匹配scode / year的地方(Izmir条约是1977年的IRN / TUR / PAK,并且没有一个在政治数据中)。< / p>

请注意,如果你想要NA,如果任何一个参与国家/地区不在政体数据中,请使用:

left_join(treaties, Polity, c("scode","year")) %>% group_by(Treaty) %>% summarise(PolityDiff=max(polity)-min(polity))

给出:

                   Treaty PolityDiff
1  albania macedonia 2002          2
2    albania moldova 2003          1
3    albania romania 2003          1
4     brazil uruguay 1986          2
5   hungary slovenia 1994         NA
6   nicaragua taiwan 2006          2
7    Treaty of Izmir 1977         NA
8 ukraine uzbekistan 1994         16

因为斯洛文尼亚在政体中被编码为SLV,但条约中存在SLO - 错误?无论如何,Polity中没有SLO / 1994,因此条约在此变体中作为NA返回。它在我的第一个例子中返回零,因为NA被删除,并且政体差异是一个数字与它自身之间的差异......