Question

我正在编写一个处理数据的脚本，需要从数据集中删除一对行中的一行。在下面的例子中，我想保持第一次稀释（总是小于第二次稀释），如果低于20,000，但如果第一次稀释超过20,000，则选择第二次稀释，无论第二次稀释是多少。确切的稀释值将从数据集到数据集不同，但对于每个患者，它将永远不会超过两个稀释度，因此我总是希望首先检查最低稀释度与20,000的阈值，这将保持相同。此数据集还包含许多包含元数据的列。

Patient   Dilution   Value 
John      2          30000
John      20         15000
George    2          13000
George    20         700
Kelly     2          49000
Kelly     20         24000
Tom       2          80000
Tom       20         30000
Diane     2          700
Diane     20         0

Patient   Dilution   Value
John      20         15000
George    2          13000
Kelly     20         24000
Tom       20         30000
Diane     2          700

如果你想看看我的其余代码，那就是（是的，我是菜鸟）。

###SA Summary

sadf <- merge(mydata, elisadata, "Description", all.x = TRUE)

sadf <- sadf[grep("X", sadf$Type),]
sadf <- sadf[-grep("Blank", sadf$Name),]
sadf <- sadf[-grep("MulV", sadf$Name),]
sadf <- sadf[,c("Isotype","Name","Description","Dilution.x","FI-Bkgd-Neg","Error","Conc..ug.ml.")]

sadf$Error <- as.character(sadf$Error)
sadf$Error[sadf$Conc..ug.ml. < 0.05] <- "LC"
sadf$Conc..ug.ml. <- ifelse(!is.na(sadf$Conc..ug.ml.) & sadf$Conc..ug.ml. < 0.05, NA, sadf$Conc..ug.ml.)

sadf$SA <- with(sadf, sadf$`FI-Bkgd-Neg` * sadf$Dilution.x / sadf$Conc..ug.ml.)

sadf$SA[sadf$SA < 0.02] <- 0.02

if (unique(sadf$Dilution) > 1) {} ###Where I need to put the answer to the question

sadf$`FI-Bkgd-Neg` <- NULL
sadf$Error[is.na(sadf$Error)] <- 0
sadf$Conc..ug.ml.[is.na(sadf$Conc..ug.ml.)] <- 0
sadf <- reshape(sadf, idvar = c("Description","Dilution.x","Isotype","Error","Conc..ug.ml."), timevar = "Name", direction = "wide")
sadf$Error[sadf$Error = 0] <- NA
sadf$Conc..ug.ml.[sadf$Conc..ug.ml. = 0] <- NA

Answer 1

使用group_by，filter患者，然后last到满足条件的行（对于按患者分组）。如果Value超过20000，则条件会返回first min，否则library(dplyr) df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000, last(Value), min(Value))) # Source: local data frame [5 x 3] # Groups: Patient [5] # # Patient Dilution Value # (fctr) (int) (int) # 1 John 20 15000 # 2 George 20 700 # 3 Kelly 20 24000 # 4 Tom 20 30000 # 5 Diane 20 0 imum。

min

注意：此方法遵循问题的措辞，该问题不会返回问题中的结果data.frame。如果条件应该返回第一稀释，如果它低于20000，您只需要将first更改为df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000, last(Value), first(Value))) # Source: local data frame [5 x 3] # Groups: Patient [5] # # Patient Dilution Value # (fctr) (int) (int) # 1 John 20 15000 # 2 George 2 13000 # 3 Kelly 20 24000 # 4 Tom 20 30000 # 5 Diane 2 700，然后从中获取结果数据框问题：

{{1}}

Answer 2

我们可以使用data.table。将'data.frame'转换为'data.table'（setDT(df)），按'患者'分组，我们使用if/else条件对具有min'值'的行进行子集化如果存在，则获得last个。

setDT(df1)[df1[ ,  .I[if(min(Value) <20000) 
        which.min(Value) else .N] , Patient]$V1]
#    Patient Dilution Value
#1:    John       20 15000
#2:  George       20   700
#3:   Kelly       20 24000
#4:     Tom       20 30000
#5:   Diane       20     0

如果条件基于first“值”，我们需要从min(Value)更改为first(Value)或Value[1L]，并使用1而不是{{ 1}}

which.min

R - 根据条件从数据帧中的每对中删除一对行中的一行

2 个答案: