我有一个数据框架,其中包含来自一些国家的基尼系数的数据。其中很多值是NA
,因此我想创建一个新的数据框,该框针对每个国家/地区都测量了最新的基尼系数。例如,如果巴西的值为2012、2013和2015,则新数据框的值将仅为2015。这就是数据的样子:
Country.Name Country.Code X2014 X2015 X2016 X2017
8 Argentina ARG 41.4 NA 42.4 NA
9 Armenia ARM 31.5 32.4 32.5 NA
13 Austria AUT 30.5 30.5 NA NA
16 Belgium BEL 28.1 27.7 NA NA
17 Benin BEN NA 47.8 NA NA
18 Burkina Faso BFA 35.3 NA NA NA
19 Bangladesh BGD NA NA 32.4 NA
20 Bulgaria BGR 37.4 NA NA NA
23 Bosnia and Herzegovina BIH NA 32.7 NA NA
24 Belarus BLR 27.2 26.7 27.0 NA
27 Bolivia BOL 47.8 46.7 44.6 NA
28 Brazil BRA 51.5 51.3 NA NA
31 Bhutan BTN NA NA NA 37.4
36 Switzerland CHE 32.5 32.3 NA NA
38 Chile CHL NA 47.7 NA NA
40 Cote d'Ivoire CIV NA 41.5 NA NA
41 Cameroon CMR 46.6 NA NA NA
44 Colombia COL 52.8 51.1 50.8 NA
47 Costa Rica CRI 48.6 48.4 48.7 NA
52 Cyprus CYP 35.6 34.0 NA NA
53 Czech Republic CZE 25.9 25.9 NA NA
54 Germany DEU NA 31.7 NA NA
57 Denmark DNK 28.4 28.2 NA NA
58 Dominican Republic DOM 44.1 44.7 45.3 NA
65 Ecuador ECU 45.0 46.0 45.0 NA
66 Egypt, Arab Rep. EGY NA 31.8 NA NA
69 Spain ESP 36.1 36.2 NA NA
70 Estonia EST 34.6 32.7 NA NA
71 Ethiopia ETH NA 39.1 NA NA
74 Finland FIN 26.8 27.1 NA NA
76 France FRA 32.3 32.7 NA NA
79 Gabon GAB NA NA NA 38.0
80 United Kingdom GBR 34.0 33.2 NA NA
81 Georgia GEO 37.3 36.4 36.5 NA
85 Gambia, The GMB NA 35.9 NA NA
88 Greece GRC 35.8 36.0 NA NA
91 Guatemala GTM 48.3 NA NA NA
96 Honduras HND 50.4 49.6 50.0 NA
98 Croatia HRV 32.1 31.1 NA NA
100 Hungary HUN 30.9 30.4 NA NA
110 Ireland IRL 31.9 31.8 NA NA
111 Iran, Islamic Rep. IRN 38.8 NA NA NA
113 Iceland ISL 27.8 NA NA NA
115 Italy ITA 34.7 35.4 NA NA
119 Kazakhstan KAZ 27.0 26.9 NA NA
120 Kenya KEN NA 40.8 NA NA
121 Kyrgyz Republic KGZ 26.8 29.0 26.8 NA
130 Liberia LBR 33.2 NA NA NA
137 Sri Lanka LKA NA NA 39.8 NA
142 Lithuania LTU 37.7 37.4 NA NA
143 Luxembourg LUX 31.2 33.8 NA NA
144 Latvia LVA 35.1 34.2 NA NA
149 Moldova MDA 26.8 27.0 26.3 NA
153 Mexico MEX 45.8 NA 43.4 NA
156 Macedonia, FYR MKD 35.6 NA NA NA
158 Malta MLT 29.0 29.4 NA NA
159 Myanmar MMR NA 38.1 NA NA
161 Montenegro MNE 31.9 NA NA NA
162 Mongolia MNG 32.0 NA 32.3 NA
164 Mozambique MOZ 54.0 NA NA NA
165 Mauritania MRT 32.6 NA NA NA
168 Malaysia MYS NA 41.0 NA NA
170 Namibia NAM NA 59.1 NA NA
172 Niger NER 34.3 NA NA NA
174 Nicaragua NIC 46.2 NA NA NA
175 Netherlands NLD 28.6 28.2 NA NA
176 Norway NOR 26.8 27.5 NA NA
183 Pakistan PAK NA 33.5 NA NA
184 Panama PAN 50.6 50.8 50.4 NA
185 Peru PER 43.4 43.5 43.8 NA
193 Portugal PRT 35.6 35.5 NA NA
194 Paraguay PRY 50.7 47.6 47.9 NA
195 West Bank and Gaza PSE NA NA 33.7 NA
200 Romania ROU 36.0 35.9 NA NA
201 Russian Federation RUS 39.9 37.7 NA NA
210 El Salvador SLV 41.6 40.6 40.0 NA
220 Slovak Republic SVK 26.1 26.5 NA NA
221 Slovenia SVN 25.7 25.4 NA NA
222 Sweden SWE 28.4 29.2 NA NA
231 Togo TGO NA 43.1 NA NA
232 Thailand THA 37.0 36.0 NA NA
233 Tajikistan TJK NA 34.0 NA NA
236 Timor-Leste TLS 28.7 NA NA NA
243 Turkey TUR 41.2 42.9 41.9 NA
246 Uganda UGA NA NA 42.8 NA
247 Ukraine UKR 24.0 25.5 25.0 NA
249 Uruguay URY 40.1 40.2 39.7 NA
250 United States USA NA NA 41.5 NA
256 Vietnam VNM 34.8 NA 35.3 NA
260 Kosovo XKX 27.3 26.4 26.5 NA
261 Yemen, Rep. YEM 36.7 NA NA NA
262 South Africa ZAF 63.0 NA NA NA
263 Zambia ZMB NA 57.1 NA NA
这已经是我制作的子集,因为我认为早于2014的值没有用。我想获得每个国家的最新价值,以进行不平等排名。有什么想法吗?
答案 0 :(得分:3)
您可以使用RewriteRule ^([a-z]+)/(filter)$ index.php?lang=$1&show=$2 [QSA,L]
:
coalesce
在基数R中(效率较低,输出相同):
library(tidyverse)
df %>% mutate(last = invoke(coalesce,df[6:3])) %>% head
# edit, more simply :
# df %>% mutate(last = coalesce(!!!df[6:3])) %>% head
# Country.Name Country.Code X2014 X2015 X2016 X2017 last
# 1 Argentina ARG 41.4 NA 42.4 NA 42.4
# 2 Armenia ARM 31.5 32.4 32.5 NA 32.5
# 3 Austria AUT 30.5 30.5 NA NA 30.5
# 4 Belgium BEL 28.1 27.7 NA NA 27.7
# 5 Benin BEN NA 47.8 NA NA 47.8
# 6 Burkina Faso BFA 35.3 NA NA NA 35.3
答案 1 :(得分:1)
一个tidyverse
选项
library(tidyverse)
df %>%
gather(Year, Index, starts_with("X")) %>%
mutate(Year = as.numeric(str_replace(Year, "X", ""))) %>%
group_by(Country.Code) %>%
arrange(Country.Code, desc(Year)) %>%
filter(!is.na(Index)) %>%
slice(1)
ungroup()
## A tibble: 93 x 4
# Country.Name Country.Code Year Index
# <fct> <fct> <dbl> <dbl>
# 1 Argentina ARG 2016 42.4
# 2 Armenia ARM 2016 32.5
# 3 Austria AUT 2015 30.5
# 4 Belgium BEL 2015 27.7
# 5 Benin BEN 2015 47.8
# 6 Burkina Faso BFA 2014 35.3
# 7 Bangladesh BGD 2016 32.4
# 8 Bulgaria BGR 2014 37.4
# 9 Bosnia and Herzegovina BIH 2015 32.7
#10 Belarus BLR 2016 27
说明:将年度Gini数据从宽到长重塑,按Country.Code
分组,按Year
降序对条目进行排序,删除NA
行,并仅保留每个{{1} }。
Country.Code
答案 2 :(得分:1)
以下是dplyr和tidyr的另一个选项。我使用的是Maurits Evers的df
。您可以使用gather()
将数据从宽格式重新格式化为长格式。然后,使用Country.Name
定义一个组变量。对于每个国家/地区,您将获得非NA值的索引,然后选择最大索引号。您可以使用它使用slice()
来子集数据。
gather(df, key = year, value = value, -Country.Name, -Country.Code) %>%
group_by(Country.Name) %>%
slice(max(which(!is.na(value))))
Country.Name Country.Code year value
<fct> <fct> <chr> <dbl>
1 Argentina ARG X2016 42.4
2 Armenia ARM X2016 32.5
3 Austria AUT X2015 30.5
4 Bangladesh BGD X2016 32.4
5 Belarus BLR X2016 27
6 Belgium BEL X2015 27.7
7 Benin BEN X2015 47.8
8 Bhutan BTN X2017 37.4
9 Bolivia BOL X2016 44.6
10 Bosnia and Herzegovina BIH X2015 32.7
... with 83 more rows
答案 3 :(得分:0)
由于您只关心这四年,所以一种简单的方法可能是检查往后每年的NA值并映射到单独的列
df$mostRecent = NA
#Moving backwards, if most recent value is NA then check the previous year
df$mostRecent[is.na(df$mostRecent)] <- df$X2017[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2016[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2015[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2014[is.na(df$mostRecent)]