这是我的第一个问题,所以如果我没有提出正确的提问方案,我会道歉。
我将R中的dplyr和plyr进行比较,以汇总数据框中的数据。
数据框很简单。我有一种药物,一组患者,每个患者都有一组反应,包括样本和数值,或该样本中的药物水平。
我正在执行的操作我正在总结level
,即患者对药物的反应,对sample
和{{1}的每个组合的每个patient
进行总结}。
两个图书馆的总结操作给出了不同的答案。 Plyr看起来很正确。第三行度量标准中的总和不应为NA,因为此子集中没有drug
s。 Plyr匹配我为这个组手动计算的东西。
知道发生了什么事吗?我希望这与dplyr在第一位患者中处理NA
的方式有关" AB"在总结步骤中。
可再现的示例
NAs
获得的结果
library(plyr)
library(dplyr)
panel <- structure(list(drug = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Paracetamol", class = "factor"),
patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L), .Label = c("AB", "AC", "AD"), class = "factor"),
sample = structure(c(6L, 8L, 12L, 9L, 5L, 3L, 9L, 1L,
2L, 11L, 7L, 10L, 2L, 13L, 4L),
.Label = c("AH", "AT", "BV",
"CD", "CK", "CM", "CU", "CV", "CZ", "DK", "DM", "DN", "DO"
), class = "factor"),
level = c(NaN, NaN, NaN, NaN, NaN,
0.00153937362708914, 0.000136048826793052, 0.0589067431555789,
0.00798507232520125, 0.000179913435935396, 0.00338149695926075,
0.000365122058519732, 0.0138121831347925, 0.000309530166151126,
0.00518926294072875)), .Names = c("drug", "patient", "sample_type",
"level"), row.names = c(NA, -15L), class = "data.frame")
plyr_version <- ddply(panel,
.(drug, patient),
mutate,
sum_level = sum(level)) %>%
ddply(.(drug, patient), summarise, metric = sum(sum_level))
dplyr_version <- group_by(panel, drug, patient) %>%
mutate(sum_level = sum(level)) %>%
summarise(metric = sum(sum_level))
print("Plyr\n")
print.data.frame(plyr_version)
print("Dplyr")
print.data.frame(dplyr_version)
如果我在sum_level步骤中使用[1] "Plyr\n"
drug patient metric
1 Paracetamol AB NaN
2 Paracetamol AC 0.3437358
3 Paracetamol AD 0.1152880
[1] "Dplyr"
drug patient metric
1 Paracetamol AB NA
2 Paracetamol AC 0.3437358
3 Paracetamol AD NA
,则结果匹配,
即na.rm = TRUE
,给予:
sum_level = sum(level, na.rm = TRUE))
已修改 - 已添加sessionInfo
[1] "Plyr"
drug patient metric
1 Paracetamol AB 0.0000000
2 Paracetamol AC 0.3437358
3 Paracetamol AD 0.1152880
[1] "Dplyr"
drug patient metric
1 Paracetamol AB 0.0000000
2 Paracetamol AC 0.3437358
3 Paracetamol AD 0.1152880