是否有一个R函数来获取一个变量的最大值,该变量在几年内出现几次?

时间:2019-10-05 10:49:09

标签: r

我试图获取一个变量的最大值,该变量在两年内两次出现在两个列表之间。例如,我有这两个列表。 list_1仅包含2002年的数据,而list_2仅包含2001年至2018年的数据。我想先对它们进行绑定,然后对每个国家和各自的年份进行绑定,以保持二者的最大值。

1 2002   Australia     2404
2 2002 New Zealand       90
3 2002        Fiji       37


    time            exporter quantity
1   2001               China        0
2   2001                Fiji        0
3   2001        South Africa        0
4   2001              Brazil        0
5   2001              Greece        0
6   2001              Turkey        0
7   2001         New Zealand        1
8   2001  Korea, Republic of        0
40  2002           Australia        0
......
29  2002                Fiji      113
34  2002         New Zealand       18
.......


I used bind_rows to combine the two lists:

df <- bind_rows(list_1, list_2)

It then becomes:

    time            exporter quantity
1   2002           Australia     2404
2   2002         New Zealand       90
3   2002                Fiji       37
4   2001               China        0
5   2001                Fiji        0
6   2001        South Africa        0
7   2001              Brazil        0
8   2001              Greece        0
9   2001              Turkey        0
10  2001         New Zealand        1
11  2001  Korea, Republic of        0
12  2001           Singapore        0
13  2001            Malaysia        0
14  2001             Bahrain        0
...........

在新的绑定列表中,我希望以下国家/地区在2002年的值分别为2404、90和113。其他国家/地区将保持不变,因为它们未出现在list_1中。因此,我正在寻找的代码将比较同年的国家,例如澳大利亚在list_1的2002 .... 2018和在list_2的2002 ... 2018中,并在新列表中保留该年和该国家/地区的最大数量。

1 个答案:

答案 0 :(得分:0)

您可以在dplyr上应用以下df脚本以获取所需的输出。

library(dplyr)
df%>% group_by(time, exporter) %>%
  filter(quantity == max(quantity)) %>%
  arrange(desc(time, exporter, quantity))
# -------------------------------------------------------------------------
# # A tibble: 11 x 3
# # Groups:   time, exporter [11]
#   time exporter    quantity
#   <dbl> <chr>          <dbl>
# 1  2002 Australia       2404
# 2  2002 New Zealand       90
# 3  2002 Fiji             113
# 4  2001 china              0
# 5  2001 Fiji               0
# 6  2001 SA                 0
# 7  2001 Brazil             0
# 8  2001 Greece             0
# 9  2001 Turkey             0
# 10 2001 New Zealand        1
# 11 2001 KRP                0

数据

#dput(list_1)
structure(list(time = c(2002, 2002, 2002), exporter = c("Australia", 
"New Zealand", "Fiji"), quantity = c(2404, 90, 37)), class = "data.frame", row.names = c(NA, 
-3L))

#dput(list_2)
structure(list(time = c(2001, 2001, 2001, 2001, 2001, 2001, 2001, 
2001, 2002, 2002, 2002), exporter = c("china", "Fiji", "SA", 
"Brazil", "Greece", "Turkey", "New Zealand", "KRP", "Australia", 
"Fiji", "New Zealand"), quantity = c(0, 0, 0, 0, 0, 0, 1, 0, 
0, 113, 18)), row.names = c(NA, -11L), class = "data.frame")

#dput(df)
structure(list(time = c(2002, 2002, 2002, 2001, 2001, 2001, 2001, 
2001, 2001, 2001, 2001, 2002, 2002, 2002), exporter = c("Australia", 
"New Zealand", "Fiji", "china", "Fiji", "SA", "Brazil", "Greece", 
"Turkey", "New Zealand", "KRP", "Australia", "Fiji", "New Zealand"
), quantity = c(2404, 90, 37, 0, 0, 0, 0, 0, 0, 1, 0, 0, 113, 
18)), row.names = c(NA, -14L), class = "data.frame")