我有一个(定向的)二元数据集,看起来像这样(见下文)。我现在想做的是每年只做一次观察。所以在这种情况下,1992年只有一次观察(AFG 1992)和1993年的一次观察(AFG 1993),同时删除了其他观察结果。从我保留在同一年的数据中观察到哪些观察结果并不重要(对country2不感兴趣)。
country1 country2 year X X1
Afghanistan Colombia 1992 1 0.44
Afghanistan Venezuela 1992 1 0.45
Afghanistan Peru 1992 1 0.46
Afghanistan Brazil 1992 1 0.47
Afghanistan Bolivia 1992 1 0.48
Afghanistan Chile 1992 1 0.49
Afghanistan Argentina 1992 1 0.50
Afghanistan Uruguay 1993 0 0.51
Afghanistan USA 1993 0 0.52
Afghanistan Canada 1993 0 0.53
Afghanistan UK 1993 0 0.54
Afghanistan Netherlands 1993 0 0.55
Afghanistan Belgium 1993 0 0.56
Afghanistan Luxembourg 1993 0 0.57
Afghanistan France 1993 0 0.58
我的尝试:
newdata<- data %>%
group_by(country1,year) %>%
summarise() %>%
select(unique.x=country1, unique.y=year)
这是有效的但我如何保留&#34;数据&#34;中的所有其他变量?在&#34; newdata&#34;?我无法想到这样做的任何方式 (我觉得更实用)。有什么帮助吗?
期望的结果
country1 year X
Afghanistan 1991 1
Afghanistan 1992 0
dput(数据)结构(list(country1 = structure(c(1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L),。标签=&#34;阿富汗&#34;,类 =&#34;因素&#34;), country2 =结构(c(8L,33L,24L,5L,4L,7L,1L,32L, 31L,6L,30L,21L,3L,19L,14L,29L,27L,26L,15L,25L, 2L,17L,10L,18L,13L,28L,23L,11L,9L,16L,12L,20L, 22L),。Label = c(&#34;阿根廷&#34;,&#34;奥地利&#34;,&#34;比利时&#34;,&#34;玻利维亚,多民族国家&#34;, &#34;巴西&#34;,&#34;加拿大&#34;,&#34;智利&#34;,&#34;哥伦比亚&#34;,&#34;古巴&#34;,&#34;捷克共和国&#34 ;, &#34;丹麦&#34;,&#34;多米尼加共和国&#34;,&#34;芬兰&#34;,&#34;法国&#34;,&#34;德国&#34;, &#34;几内亚比绍&#34;,&#34;匈牙利&#34;,&#34;意大利&#34;,&#34;卢森堡&#34;,&#34;毛里塔尼亚&#34;, &#34;荷兰&#34;,&#34;尼日尔&#34;,&#34;挪威&#34;,&#34;秘鲁&#34;,&#34;波兰&#34;,&#34;葡萄牙& #34 ;, &#34;西班牙&#34;,&#34;瑞典&#34;,&#34;瑞士&#34;,&#34;英国&#34;,&#34;美国&#34;, &#34;乌拉圭&#34;,&#34;委内瑞拉,玻利瓦尔共和国&#34;),类=&#34;因素&#34;), 年= c(1992L,1992L,1992L,1992L,1992L,1992L,1992L, 1993L,1993L,1993L,1993L,1993L,1993L,1993L,1993L,1994L, 1994L,1994L,1994L,1994L,1994L,1994L,1994L,1995L,1995L, 1995L,1995L,1995L,1995L,1995L,1995L,1995L,1995L), X = c(1L,1L,1L,1L,1L,1L,1L,0L,0L,0L,0L,0L,0L, 0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L),X1 = c(0.44,0.45,0.46,0.47,0.48, 0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58, 0.59,0.6,0.61,0.62,0.63,0.64,0.65,0.66,0.67,0.68, 0.69,0.7,0.71,0.72,0.73,0.74,0.75,0.76)),. Names = c(&#34; country1&#34;,&#34; country2&#34;,&#34; year&#34;, &#34; X&#34;,&#34; X1&#34;),class =&#34; data.frame&#34;, row.names = c(NA, -33L))
答案 0 :(得分:1)
newdata <- olddata[!duplicated(olddata$year),]
回答问题
newdata <- olddata[!duplicated(paste(olddata$country1, olddata$year)),]
给你你想要的东西
答案 1 :(得分:0)
我不能真正理解您的问题,但要获得所需的输出,您可以使用:
data %>%
group_by(country1, year) %>%
summarise(X = mean(X))
当您将其应用于整个data.frame时,请注意,对于X
和country1
的唯一组合,此代码将返回year
中所有值的平均值。
答案 2 :(得分:0)
你可以尝试:
data %>%
group_by(year) %>%
top_n(1) %>%
select(country1, X)