这似乎很简单,但无法弄清楚。我想在 df2
(impute_id
)中创建一个新列,以标识值(measurement
)是否被估算或是否被估算。来自 df1
的原始观察值。如果行匹配,则在 df2
的新列impute_id
中,分配字符串observed
;如果行不匹配,则分配字符串imputed
。如果可能,我想使用dplyr
进行此操作。还要注意,即使在示例中,数据帧中的行也可能不以相同的顺序排列。
示例
原始数据
df1
time protocol measurement_type sample measurement
1 0 HPLC cis,cis-Muconic acid a 0.57561
2 0 HPLC D-Glucose a NA
3 0 HPLC cis,cis-Muconic acid a NA
4 0 HPLC D-Glucose b NA
5 0 OD600 Optical Density b 0.14430
6 22 HPLC cis,cis-Muconic acid b NA
7 22 HPLC D-Glucose a NA
8 22 OD600 Optical Density a NA
9 24 HPLC cis,cis-Muconic acid a NA
10 24 HPLC D-Glucose b 33.95529
输入的数据
df2
time protocol measurement_type sample measurement
1 0 HPLC cis,cis-Muconic acid a 0.57561
2 0 HPLC D-Glucose a 33.95529
3 0 HPLC cis,cis-Muconic acid a 0.57561
4 0 HPLC D-Glucose b 33.95529
5 0 OD600 Optical Density b 0.14430
6 22 HPLC cis,cis-Muconic acid b 0.57561
7 22 HPLC D-Glucose a 33.95529
8 22 OD600 Optical Density a 0.14430
9 24 HPLC cis,cis-Muconic acid a 0.57561
10 24 HPLC D-Glucose b 33.95529
所需的输出
df2
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed
可复制数据
原始数据
df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA,
-10L), class = "data.frame")
输入的数据
df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L,
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L,
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid",
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a",
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561,
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443,
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")
答案 0 :(得分:1)
也许是
library(dplyr)
df1 %>%
group_by(measurement_type) %>%
mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
measurement = min(measurement, na.rm = TRUE))
time protocol measurement_type sample measurement impute_id
1 0 HPLC cis,cis-Muconic acid a 0.57561 observed
2 0 HPLC D-Glucose a 33.95529 imputed
3 0 HPLC cis,cis-Muconic acid a 0.57561 imputed
4 0 HPLC D-Glucose b 33.95529 imputed
5 0 OD600 Optical Density b 0.14430 observed
6 22 HPLC cis,cis-Muconic acid b 0.57561 imputed
7 22 HPLC D-Glucose a 33.95529 imputed
8 22 OD600 Optical Density a 0.14430 imputed
9 24 HPLC cis,cis-Muconic acid a 0.57561 imputed
10 24 HPLC D-Glucose b 33.95529 observed