我有以下数据框:
df <- structure(list(country = c("Ghana", "Eritrea", "Ethiopia", "Ethiopia",
"Congo - Kinshasa", "Ethiopia", "Ethiopia", "Ghana", "Botswana",
"Nigeria"), CommodRank = c(1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 1L,
1L), topCommodInCountry = c(TRUE, FALSE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE), Main_Commod = c("Gold", "Copper",
"Nickel", "Gold", "Gold", "Gold", "Gold", "Gold", "Diamonds",
"Iron Ore")), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "country", drop = TRUE, indices = list(
8L, 4L, 1L, c(2L, 3L, 5L, 6L), c(0L, 7L), 9L), group_sizes = c(1L,
1L, 1L, 4L, 2L, 1L), biggest_group_size = 4L, labels = structure(list(
country = c("Botswana", "Congo - Kinshasa", "Eritrea", "Ethiopia",
"Ghana", "Nigeria")), row.names = c(NA, -6L), class = "data.frame", vars = "country", drop = TRUE, .Names = "country"), .Names = c("country",
"CommodRank", "topCommodInCountry", "Main_Commod"))
df
country CommodRank topCommodInCountry Main_Commod
1 Ghana 1 TRUE Gold
2 Eritrea 2 FALSE Copper
3 Ethiopia 3 FALSE Nickel
4 Ethiopia 1 TRUE Gold
5 Congo - Kinshasa 3 FALSE Gold
6 Ethiopia 1 TRUE Gold
7 Ethiopia 1 TRUE Gold
8 Ghana 1 TRUE Gold
9 Botswana 1 TRUE Diamonds
10 Nigeria 1 TRUE Iron Ore
我正在尝试添加另一列显示此数据集中每个国家/地区的顶级商品(顶级CommodRank),但我不确定如何。我能够标注&#39; topcommod&#39;使用&#39; Main_Commod&#39;其中CommodRank == 1,但我想将这个相同的值复制到CommodRank的情况下!= 1.看下面,两个埃塞俄比亚的值都在第3行和第3行。 4应该阅读“黄金”。
df %>% mutate(topcommod = ifelse(CommodRank == 1, Main_Commod, 'unknown'))
country CommodRank topCommodInCountry Main_Commod topcommod
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel unknown
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore
我理想地寻找一个dplyr解决方案,我可以添加到现有的长系列管道%&gt;%函数调用,但任何解决方案都会有所帮助。
答案 0 :(得分:5)
IIUC,有多种方法可以做到这一点,例如:
df %>% mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)])
# A tibble: 10 x 5
# Groups: country [6]
country CommodRank topCommodInCountry Main_Commod topCom
<chr> <int> <lgl> <chr> <chr>
1 Ghana 1 TRUE Gold Gold
2 Eritrea 2 FALSE Copper unknown
3 Ethiopia 3 FALSE Nickel Gold
4 Ethiopia 1 TRUE Gold Gold
5 Congo - Kinshasa 3 FALSE Gold unknown
6 Ethiopia 1 TRUE Gold Gold
7 Ethiopia 1 TRUE Gold Gold
8 Ghana 1 TRUE Gold Gold
9 Botswana 1 TRUE Diamonds Diamonds
10 Nigeria 1 TRUE Iron Ore Iron Ore
关于OP在评论中如何处理多个顶级商品关系的问题,您可以执行以下操作:
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else paste(unique(Main_Commod[topCommodInCountry]), collapse = "/"))
如果某个国家/地区有多个独特的顶级商品,则会将它们粘贴到一个字符串中,以/
分隔。
答案 1 :(得分:0)
dplyr
...
df %>% arrange(CommodRank) %>%
mutate(topCommod = Main_Commod[1])
答案 2 :(得分:0)
这不是一个答案,但是从@docendo discimus回答中学到了很多,我花了一秒钟来理解“if negative”(!any(topCommodInCountry)
),我想知道它是否只有我还是需要我的电脑再来一次这样做:)
使用相同的数据集,我检查了使if else
为正的想法。首先,我在两个解决方案之间测试identical
:
identical(
#Negative
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)]),
#Positive
df %>%
mutate(topCom = if(any(topCommodInCountry)) Main_Commod[which.max(topCommodInCountry)]
else "unknown"))
[1] TRUE
接下来,我测试了两者的基准:
require(rbenchmark)
benchmark("Negative" = {
df %>%
mutate(topCom = if(!any(topCommodInCountry)) "unknown"
else Main_Commod[which.max(topCommodInCountry)])
},
"Positive" = {
df %>%
mutate(topCom = if(any(topCommodInCountry)) Main_Commod[which.max(topCommodInCountry)]
else "unknown")
},
replications = 10000,
columns = c("test", "replications", "elapsed",
"relative", "user.self", "sys.self"))
差别并不大,但我假设有了更大的数据集,它会增加。
test replications elapsed relative user.self sys.self
1 Negative 10000 12.59 1.015 12.44 0
2 Positive 10000 12.41 1.000 12.30 0