我有一个如下数据框;
我想选择最大和最小概率' '年的价值' 2017年。无论哪个主题具有最大和最小概率值,这些主题的所有实例都必须收集在下面的另一个数据框中;
(在上面的例子中,主题V16在2017年的概率最小,V30具有最大概率)
答案 0 :(得分:0)
我们可以使用tidyverse
。如果我们需要获取“主题”的行,其中“概率”仅为2017年的“{1}”,那么
max/min
或使用library(dplyr)
df1 %>%
filter(topics %in% topics[probability == max(probability) & years == 2017]|
topics %in% topics[probability == min(probability) & years == 2017])
# A tibble: 4 x 3
# Groups: years [2]
# years topics probability
# <int> <chr> <dbl>
#1 2016 V10 0.0553
#2 2016 V15 0.0164
#3 2017 V30 0.0714
#4 2017 V16 0.0130
slice
或使用df1 %>%
slice(c(which(topics %in% topics[probability == max(probability) & years == 2017]),
which(topics %in% topics[probability == min(probability) & years == 2017])))
# A tibble: 4 x 3
# years topics probability
# <int> <chr> <dbl>
#1 2016 V30 0.0219
#2 2017 V30 0.0714
#3 2016 V16 0.0300
#4 2017 V16 0.0130
base R
subset(df1, topics %in% subset(df1, years == 2017 &
probability %in% range(probability), select = "topics")[[1]])
答案 1 :(得分:0)
你可以尝试
library(data.table)
a=setDT(df)[years==2017,topics[c(which.min(probability),which.max(probability))],by=years]
subset(df,topics%in%a$V1)
在基础r中,您可以执行以下操作:
a=aggregate(probability~years,subset(df,years==2017),function(x)c(which.max(x),which.min(x)))
subset(df,topics%in%topics[c(a$probability)])