如何提取最大值

时间:2018-01-30 02:51:01

标签: r subset

我想在R中对数据框的值进行子集化。首先,我想选择“≥35%”类别。其次,在第一步之后,我想选择“百分比”值的最大值。以下是我原始CSV文件的部分内容。

 ID   Code  Code2  Percent   category
A001  0123  10000     0        <35%
A001  0123  20000    66        ≥35%
A001  0123  30000    34        <35%
B001  7894  52003   100        ≥35%
C001  2020  35001    20        <35%
C001  2020  35002    20        <35%
C001  2020  35003    20        <35%
C001  2020  35004    20        <35%
C001  2020  35005    20        <35%

但是,我希望过滤我的数据框,如下图所示。

 ID   Code  Code2  Percent   category
A001  0123  20000    66        ≥35%
B001  7894  52003   100        ≥35%
C001  2020  35001    20        <35%
C001  2020  35002    20        <35%
C001  2020  35003    20        <35%
C001  2020  35004    20        <35%
C001  2020  35005    20        <35%

实际上,我尝试了一些R代码来制作一个我希望得到的结果。

X <- subset(dataframe, category =="≥35%" | Percent == max(Percent))

但是这段代码没有给出结果;因此,我使用了另一个代码。

X <- do.call(rbind, lapply(split(dataframe, as.factor(dataframe$ID)), function(x) {return(x[which.max(x$Percent),])}))

尽管如此,它也没有用。

有没有人可以帮助我?你能提出任何建议吗?感谢您阅读我的问题。

1 个答案:

答案 0 :(得分:1)

希望这有帮助!

library(dplyr)

#processing on data where category is equal to '<35%' ONLY
df_le <- df %>%
  group_by(ID) %>%
  filter(!any(category == '≥35%')) %>%
  filter(Percent==max(Percent)) %>%
  data.frame()

#final data by combining both categories
final_df <- rbind(df %>%
                    filter(category=='≥35%'),
                  df_le)
final_df

输出是:

    ID Code Code2 Percent category
1 A001  123 20000      66    >=35%
2 B001 7894 52003     100    >=35%
3 C001 2020 35001      20     <35%
4 C001 2020 35002      20     <35%
5 c001 2020 35003      20     <35%
6 C001 2020 35004      20     <35%
7 C001 2020 35005      20     <35%

示例数据:

df <- structure(list(ID = c("A001", "A001", "A001", "B001", "C001", 
"C001", "c001", "C001", "C001"), Code = c(123L, 123L, 123L, 7894L, 
2020L, 2020L, 2020L, 2020L, 2020L), Code2 = c(10000L, 20000L, 
30000L, 52003L, 35001L, 35002L, 35003L, 35004L, 35005L), Percent = c(0L, 
66L, 34L, 100L, 20L, 20L, 20L, 20L, 20L), category = c("<35%", 
">=35%", "<35%", ">=35%", "<35%", "<35%", "<35%", "<35%", "<35%"
)), .Names = c("ID", "Code", "Code2", "Percent", "category"), class = "data.frame", row.names = c(NA, 
-9L))