我试图获取一个变量的最大值,该变量在两年内两次出现在两个列表之间。例如,我有这两个列表。 list_1仅包含2002年的数据,而list_2仅包含2001年至2018年的数据。我想先对它们进行绑定,然后对每个国家和各自的年份进行绑定,以保持二者的最大值。
1 2002 Australia 2404
2 2002 New Zealand 90
3 2002 Fiji 37
time exporter quantity
1 2001 China 0
2 2001 Fiji 0
3 2001 South Africa 0
4 2001 Brazil 0
5 2001 Greece 0
6 2001 Turkey 0
7 2001 New Zealand 1
8 2001 Korea, Republic of 0
40 2002 Australia 0
......
29 2002 Fiji 113
34 2002 New Zealand 18
.......
I used bind_rows to combine the two lists:
df <- bind_rows(list_1, list_2)
It then becomes:
time exporter quantity
1 2002 Australia 2404
2 2002 New Zealand 90
3 2002 Fiji 37
4 2001 China 0
5 2001 Fiji 0
6 2001 South Africa 0
7 2001 Brazil 0
8 2001 Greece 0
9 2001 Turkey 0
10 2001 New Zealand 1
11 2001 Korea, Republic of 0
12 2001 Singapore 0
13 2001 Malaysia 0
14 2001 Bahrain 0
...........
在新的绑定列表中,我希望以下国家/地区在2002年的值分别为2404、90和113。其他国家/地区将保持不变,因为它们未出现在list_1中。因此,我正在寻找的代码将比较同年的国家,例如澳大利亚在list_1的2002 .... 2018和在list_2的2002 ... 2018中,并在新列表中保留该年和该国家/地区的最大数量。
答案 0 :(得分:0)
您可以在dplyr
上应用以下df
脚本以获取所需的输出。
library(dplyr)
df%>% group_by(time, exporter) %>%
filter(quantity == max(quantity)) %>%
arrange(desc(time, exporter, quantity))
# -------------------------------------------------------------------------
# # A tibble: 11 x 3
# # Groups: time, exporter [11]
# time exporter quantity
# <dbl> <chr> <dbl>
# 1 2002 Australia 2404
# 2 2002 New Zealand 90
# 3 2002 Fiji 113
# 4 2001 china 0
# 5 2001 Fiji 0
# 6 2001 SA 0
# 7 2001 Brazil 0
# 8 2001 Greece 0
# 9 2001 Turkey 0
# 10 2001 New Zealand 1
# 11 2001 KRP 0
#dput(list_1)
structure(list(time = c(2002, 2002, 2002), exporter = c("Australia",
"New Zealand", "Fiji"), quantity = c(2404, 90, 37)), class = "data.frame", row.names = c(NA,
-3L))
#dput(list_2)
structure(list(time = c(2001, 2001, 2001, 2001, 2001, 2001, 2001,
2001, 2002, 2002, 2002), exporter = c("china", "Fiji", "SA",
"Brazil", "Greece", "Turkey", "New Zealand", "KRP", "Australia",
"Fiji", "New Zealand"), quantity = c(0, 0, 0, 0, 0, 0, 1, 0,
0, 113, 18)), row.names = c(NA, -11L), class = "data.frame")
#dput(df)
structure(list(time = c(2002, 2002, 2002, 2001, 2001, 2001, 2001,
2001, 2001, 2001, 2001, 2002, 2002, 2002), exporter = c("Australia",
"New Zealand", "Fiji", "china", "Fiji", "SA", "Brazil", "Greece",
"Turkey", "New Zealand", "KRP", "Australia", "Fiji", "New Zealand"
), quantity = c(2404, 90, 37, 0, 0, 0, 0, 0, 0, 1, 0, 0, 113,
18)), row.names = c(NA, -14L), class = "data.frame")