我想在R中对数据框的值进行子集化。首先,我想选择“≥35%”类别。其次,在第一步之后,我想选择“百分比”值的最大值。以下是我原始CSV文件的部分内容。
ID Code Code2 Percent category
A001 0123 10000 0 <35%
A001 0123 20000 66 ≥35%
A001 0123 30000 34 <35%
B001 7894 52003 100 ≥35%
C001 2020 35001 20 <35%
C001 2020 35002 20 <35%
C001 2020 35003 20 <35%
C001 2020 35004 20 <35%
C001 2020 35005 20 <35%
但是,我希望过滤我的数据框,如下图所示。
ID Code Code2 Percent category
A001 0123 20000 66 ≥35%
B001 7894 52003 100 ≥35%
C001 2020 35001 20 <35%
C001 2020 35002 20 <35%
C001 2020 35003 20 <35%
C001 2020 35004 20 <35%
C001 2020 35005 20 <35%
实际上,我尝试了一些R代码来制作一个我希望得到的结果。
X <- subset(dataframe, category =="≥35%" | Percent == max(Percent))
但是这段代码没有给出结果;因此,我使用了另一个代码。
X <- do.call(rbind, lapply(split(dataframe, as.factor(dataframe$ID)), function(x) {return(x[which.max(x$Percent),])}))
尽管如此,它也没有用。
有没有人可以帮助我?你能提出任何建议吗?感谢您阅读我的问题。
答案 0 :(得分:1)
希望这有帮助!
library(dplyr)
#processing on data where category is equal to '<35%' ONLY
df_le <- df %>%
group_by(ID) %>%
filter(!any(category == '≥35%')) %>%
filter(Percent==max(Percent)) %>%
data.frame()
#final data by combining both categories
final_df <- rbind(df %>%
filter(category=='≥35%'),
df_le)
final_df
输出是:
ID Code Code2 Percent category
1 A001 123 20000 66 >=35%
2 B001 7894 52003 100 >=35%
3 C001 2020 35001 20 <35%
4 C001 2020 35002 20 <35%
5 c001 2020 35003 20 <35%
6 C001 2020 35004 20 <35%
7 C001 2020 35005 20 <35%
示例数据:
df <- structure(list(ID = c("A001", "A001", "A001", "B001", "C001",
"C001", "c001", "C001", "C001"), Code = c(123L, 123L, 123L, 7894L,
2020L, 2020L, 2020L, 2020L, 2020L), Code2 = c(10000L, 20000L,
30000L, 52003L, 35001L, 35002L, 35003L, 35004L, 35005L), Percent = c(0L,
66L, 34L, 100L, 20L, 20L, 20L, 20L, 20L), category = c("<35%",
">=35%", "<35%", ">=35%", "<35%", "<35%", "<35%", "<35%", "<35%"
)), .Names = c("ID", "Code", "Code2", "Percent", "category"), class = "data.frame", row.names = c(NA,
-9L))