如何创建一个新的数据框,比较值并仅获取R中的最新数据?

时间:2018-10-07 01:01:09

标签: r dplyr subset data-science data-analysis

我有一个数据框架,其中包含来自一些国家的基尼系数的数据。其中很多值是NA,因此我想创建一个新的数据框,该框针对每个国家/地区都测量了最新的基尼系数。例如,如果巴西的值为2012、2013和2015,则新数据框的值将仅为2015。这就是数据的样子:

              Country.Name Country.Code X2014 X2015 X2016 X2017
8                Argentina          ARG  41.4    NA  42.4    NA
9                  Armenia          ARM  31.5  32.4  32.5    NA
13                 Austria          AUT  30.5  30.5    NA    NA
16                 Belgium          BEL  28.1  27.7    NA    NA
17                   Benin          BEN    NA  47.8    NA    NA
18            Burkina Faso          BFA  35.3    NA    NA    NA
19              Bangladesh          BGD    NA    NA  32.4    NA
20                Bulgaria          BGR  37.4    NA    NA    NA
23  Bosnia and Herzegovina          BIH    NA  32.7    NA    NA
24                 Belarus          BLR  27.2  26.7  27.0    NA
27                 Bolivia          BOL  47.8  46.7  44.6    NA
28                  Brazil          BRA  51.5  51.3    NA    NA
31                  Bhutan          BTN    NA    NA    NA  37.4
36             Switzerland          CHE  32.5  32.3    NA    NA
38                   Chile          CHL    NA  47.7    NA    NA
40           Cote d'Ivoire          CIV    NA  41.5    NA    NA
41                Cameroon          CMR  46.6    NA    NA    NA
44                Colombia          COL  52.8  51.1  50.8    NA
47              Costa Rica          CRI  48.6  48.4  48.7    NA
52                  Cyprus          CYP  35.6  34.0    NA    NA
53          Czech Republic          CZE  25.9  25.9    NA    NA
54                 Germany          DEU    NA  31.7    NA    NA
57                 Denmark          DNK  28.4  28.2    NA    NA
58      Dominican Republic          DOM  44.1  44.7  45.3    NA
65                 Ecuador          ECU  45.0  46.0  45.0    NA
66        Egypt, Arab Rep.          EGY    NA  31.8    NA    NA
69                   Spain          ESP  36.1  36.2    NA    NA
70                 Estonia          EST  34.6  32.7    NA    NA
71                Ethiopia          ETH    NA  39.1    NA    NA
74                 Finland          FIN  26.8  27.1    NA    NA
76                  France          FRA  32.3  32.7    NA    NA
79                   Gabon          GAB    NA    NA    NA  38.0
80          United Kingdom          GBR  34.0  33.2    NA    NA
81                 Georgia          GEO  37.3  36.4  36.5    NA
85             Gambia, The          GMB    NA  35.9    NA    NA
88                  Greece          GRC  35.8  36.0    NA    NA
91               Guatemala          GTM  48.3    NA    NA    NA
96                Honduras          HND  50.4  49.6  50.0    NA
98                 Croatia          HRV  32.1  31.1    NA    NA
100                Hungary          HUN  30.9  30.4    NA    NA
110                Ireland          IRL  31.9  31.8    NA    NA
111     Iran, Islamic Rep.          IRN  38.8    NA    NA    NA
113                Iceland          ISL  27.8    NA    NA    NA
115                  Italy          ITA  34.7  35.4    NA    NA
119             Kazakhstan          KAZ  27.0  26.9    NA    NA
120                  Kenya          KEN    NA  40.8    NA    NA
121        Kyrgyz Republic          KGZ  26.8  29.0  26.8    NA
130                Liberia          LBR  33.2    NA    NA    NA
137              Sri Lanka          LKA    NA    NA  39.8    NA
142              Lithuania          LTU  37.7  37.4    NA    NA
143             Luxembourg          LUX  31.2  33.8    NA    NA
144                 Latvia          LVA  35.1  34.2    NA    NA
149                Moldova          MDA  26.8  27.0  26.3    NA
153                 Mexico          MEX  45.8    NA  43.4    NA
156         Macedonia, FYR          MKD  35.6    NA    NA    NA
158                  Malta          MLT  29.0  29.4    NA    NA
159                Myanmar          MMR    NA  38.1    NA    NA
161             Montenegro          MNE  31.9    NA    NA    NA
162               Mongolia          MNG  32.0    NA  32.3    NA
164             Mozambique          MOZ  54.0    NA    NA    NA
165             Mauritania          MRT  32.6    NA    NA    NA
168               Malaysia          MYS    NA  41.0    NA    NA
170                Namibia          NAM    NA  59.1    NA    NA
172                  Niger          NER  34.3    NA    NA    NA
174              Nicaragua          NIC  46.2    NA    NA    NA
175            Netherlands          NLD  28.6  28.2    NA    NA
176                 Norway          NOR  26.8  27.5    NA    NA
183               Pakistan          PAK    NA  33.5    NA    NA
184                 Panama          PAN  50.6  50.8  50.4    NA
185                   Peru          PER  43.4  43.5  43.8    NA
193               Portugal          PRT  35.6  35.5    NA    NA
194               Paraguay          PRY  50.7  47.6  47.9    NA
195     West Bank and Gaza          PSE    NA    NA  33.7    NA
200                Romania          ROU  36.0  35.9    NA    NA
201     Russian Federation          RUS  39.9  37.7    NA    NA
210            El Salvador          SLV  41.6  40.6  40.0    NA
220        Slovak Republic          SVK  26.1  26.5    NA    NA
221               Slovenia          SVN  25.7  25.4    NA    NA
222                 Sweden          SWE  28.4  29.2    NA    NA
231                   Togo          TGO    NA  43.1    NA    NA
232               Thailand          THA  37.0  36.0    NA    NA
233             Tajikistan          TJK    NA  34.0    NA    NA
236            Timor-Leste          TLS  28.7    NA    NA    NA
243                 Turkey          TUR  41.2  42.9  41.9    NA
246                 Uganda          UGA    NA    NA  42.8    NA
247                Ukraine          UKR  24.0  25.5  25.0    NA
249                Uruguay          URY  40.1  40.2  39.7    NA
250          United States          USA    NA    NA  41.5    NA
256                Vietnam          VNM  34.8    NA  35.3    NA
260                 Kosovo          XKX  27.3  26.4  26.5    NA
261            Yemen, Rep.          YEM  36.7    NA    NA    NA
262           South Africa          ZAF  63.0    NA    NA    NA
263                 Zambia          ZMB    NA  57.1    NA    NA

这已经是我制作的子集,因为我认为早于2014的值没有用。我想获得每个国家的最新价值,以进行不平等排名。有什么想法吗?

4 个答案:

答案 0 :(得分:3)

您可以使用RewriteRule ^([a-z]+)/(filter)$ index.php?lang=$1&show=$2 [QSA,L]

coalesce

在基数R中(效率较低,输出相同):

library(tidyverse)
df %>% mutate(last = invoke(coalesce,df[6:3])) %>% head
# edit, more simply : 
# df %>% mutate(last = coalesce(!!!df[6:3])) %>% head
#             Country.Name Country.Code X2014 X2015 X2016 X2017 last
# 1              Argentina          ARG  41.4    NA  42.4    NA 42.4
# 2                Armenia          ARM  31.5  32.4  32.5    NA 32.5
# 3                Austria          AUT  30.5  30.5    NA    NA 30.5
# 4                Belgium          BEL  28.1  27.7    NA    NA 27.7
# 5                  Benin          BEN    NA  47.8    NA    NA 47.8
# 6           Burkina Faso          BFA  35.3    NA    NA    NA 35.3

答案 1 :(得分:1)

一个tidyverse选项

library(tidyverse)
df %>%
    gather(Year, Index, starts_with("X")) %>%
    mutate(Year = as.numeric(str_replace(Year, "X", ""))) %>%
    group_by(Country.Code) %>%
    arrange(Country.Code, desc(Year)) %>%
    filter(!is.na(Index)) %>%
    slice(1)
    ungroup()
## A tibble: 93 x 4
#   Country.Name           Country.Code  Year Index
#   <fct>                  <fct>        <dbl> <dbl>
# 1 Argentina              ARG           2016  42.4
# 2 Armenia                ARM           2016  32.5
# 3 Austria                AUT           2015  30.5
# 4 Belgium                BEL           2015  27.7
# 5 Benin                  BEN           2015  47.8
# 6 Burkina Faso           BFA           2014  35.3
# 7 Bangladesh             BGD           2016  32.4
# 8 Bulgaria               BGR           2014  37.4
# 9 Bosnia and Herzegovina BIH           2015  32.7
#10 Belarus                BLR           2016  27

说明:将年度Gini数据从宽到长重塑,按Country.Code分组,按Year降序对条目进行排序,删除NA行,并仅保留每个{{1} }。


样本数据

Country.Code

答案 2 :(得分:1)

以下是dplyr和tidyr的另一个选项。我使用的是Maurits Evers的df。您可以使用gather()将数据从宽格式重新格式化为长格式。然后,使用Country.Name定义一个组变量。对于每个国家/地区,您将获得非NA值的索引,然后选择最大索引号。您可以使用它使用slice()来子集数据。

gather(df, key = year, value = value, -Country.Name, -Country.Code) %>%
group_by(Country.Name) %>%
slice(max(which(!is.na(value))))

   Country.Name           Country.Code year  value
   <fct>                  <fct>        <chr> <dbl>
 1 Argentina              ARG          X2016  42.4
 2 Armenia                ARM          X2016  32.5
 3 Austria                AUT          X2015  30.5
 4 Bangladesh             BGD          X2016  32.4
 5 Belarus                BLR          X2016  27  
 6 Belgium                BEL          X2015  27.7
 7 Benin                  BEN          X2015  47.8
 8 Bhutan                 BTN          X2017  37.4
 9 Bolivia                BOL          X2016  44.6
10 Bosnia and Herzegovina BIH          X2015  32.7
 ... with 83 more rows

答案 3 :(得分:0)

由于您只关心这四年,所以一种简单的方法可能是检查往后每年的NA值并映射到单独的列

df$mostRecent = NA

#Moving backwards, if most recent value is NA then check the previous year

df$mostRecent[is.na(df$mostRecent)] <- df$X2017[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2016[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2015[is.na(df$mostRecent)] 
df$mostRecent[is.na(df$mostRecent)] <- df$X2014[is.na(df$mostRecent)]