根据两个数据框中的行是否匹配来创建新列

时间:2019-11-29 17:08:15

标签: r dataframe dplyr conditional-statements mutate

这似乎很简单,但无法弄清楚。我想在 df2 impute_id)中创建一个新列,以标识值(measurement)是否被估算或是否被估算。来自 df1 的原始观察值。如果行匹配,则在 df2 的新列impute_id中,分配字符串observed;如果行不匹配,则分配字符串imputed。如果可能,我想使用dplyr进行此操作。还要注意,即使在示例中,数据帧中的行也可能不以相同的顺序排列。


示例

原始数据

df1
   time protocol     measurement_type sample measurement
1     0     HPLC cis,cis-Muconic acid      a     0.57561
2     0     HPLC            D-Glucose      a          NA
3     0     HPLC cis,cis-Muconic acid      a          NA
4     0     HPLC            D-Glucose      b          NA
5     0    OD600      Optical Density      b     0.14430
6    22     HPLC cis,cis-Muconic acid      b          NA
7    22     HPLC            D-Glucose      a          NA
8    22    OD600      Optical Density      a          NA
9    24     HPLC cis,cis-Muconic acid      a          NA
10   24     HPLC            D-Glucose      b    33.95529

输入的数据

df2
   time protocol     measurement_type sample measurement
1     0     HPLC cis,cis-Muconic acid      a     0.57561
2     0     HPLC            D-Glucose      a    33.95529
3     0     HPLC cis,cis-Muconic acid      a     0.57561
4     0     HPLC            D-Glucose      b    33.95529
5     0    OD600      Optical Density      b     0.14430
6    22     HPLC cis,cis-Muconic acid      b     0.57561
7    22     HPLC            D-Glucose      a    33.95529
8    22    OD600      Optical Density      a     0.14430
9    24     HPLC cis,cis-Muconic acid      a     0.57561
10   24     HPLC            D-Glucose      b    33.95529

所需的输出

df2
   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561   observed
2     0     HPLC            D-Glucose      a    33.95529    imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561    imputed
4     0     HPLC            D-Glucose      b    33.95529    imputed
5     0    OD600      Optical Density      b     0.14430   observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561    imputed
7    22     HPLC            D-Glucose      a    33.95529    imputed
8    22    OD600      Optical Density      a     0.14430    imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561    imputed
10   24     HPLC            D-Glucose      b    33.95529   observed

可复制数据

原始数据

df1 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
NA, NA, NA, 0.1443, NA, NA, NA, NA, 33.95529)), row.names = c(NA, 
-10L), class = "data.frame")

输入的数据

df2 <- structure(list(time = c(0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 24L, 
24L), protocol = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 1L), .Label = c("HPLC", "OD600"), class = "factor"), measurement_type = structure(c(1L, 
2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), .Label = c("cis,cis-Muconic acid", 
"D-Glucose", "Optical Density"), class = "factor"), sample = c("a", 
"a", "a", "b", "b", "b", "a", "a", "a", "b"), measurement = c(0.57561, 
33.95529, 0.57561, 33.95529, 0.1443, 0.57561, 33.95529, 0.1443, 
0.57561, 33.95529)), row.names = c(NA, -10L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

也许是

library(dplyr)

df1 %>%
  group_by(measurement_type) %>%
  mutate(impute_id = ifelse(is.na(measurement), "imputed", "observed"),
         measurement = min(measurement, na.rm = TRUE))

   time protocol     measurement_type sample measurement  impute_id
1     0     HPLC cis,cis-Muconic acid      a     0.57561 observed
2     0     HPLC            D-Glucose      a    33.95529  imputed
3     0     HPLC cis,cis-Muconic acid      a     0.57561  imputed
4     0     HPLC            D-Glucose      b    33.95529  imputed
5     0    OD600      Optical Density      b     0.14430 observed
6    22     HPLC cis,cis-Muconic acid      b     0.57561  imputed
7    22     HPLC            D-Glucose      a    33.95529  imputed
8    22    OD600      Optical Density      a     0.14430  imputed
9    24     HPLC cis,cis-Muconic acid      a     0.57561  imputed
10   24     HPLC            D-Glucose      b    33.95529 observed