根据最小值和最大值查找R数据帧

时间:2017-07-16 08:21:22

标签: r max lookup min

我正在寻找一种将R中的数据集连接到查找数据集的简单方法。查找数据集具有min,max和标签,如下所示。

library(dplyr)

min_age <- c(0, .5, 1, 5, 17, 35, 50, 75)
max_age <- c(.5, 1, 5, 17, 35, 50, 75, 125)
age_lbl <- c("< 6 months", "6 months - 1 year", "1 year - 5 years", 
             "5 years - 17 years", "17 years - 35 years", "35 years - 50 years"
             , "50 years - 75 years", "> 75 years")
age_lbl <- as.factor(age_lbl)
lkp_df <- data.frame(min_age = min_age, max_age = max_age, age_grp_lb = age_lbl)

数据框如下所示

lkp_df
min_age max_age          age_grp_lb 
0.0     0.5              < 6 months
0.5     1.0              6 months - 1 year    
1.0     5.0              1 year - 5 years
5.0     17.0             5 years - 17 years
17.0    35.0             17 years - 35 years
35.0    50.0             35 years - 50 years
50.0    75.0             50 years - 75 years
75.0   125.0             > 75 years

我现在想加入一个患者年龄如下的数据框。连接应基于min_age和max_age,并获取患者数据框中的age_grp_lbl值。

head(pat_df)   
pat_id   pat_age 
1001     14.397769
1002     66.694280
1003     53.628013
1004     58.782156
1005      5.032531
1006     16.430463

我有一种迂回的方式处理这个问题。

# Introduce dummy column to obtain a cartesian
lkp_df <- lkp_df %>%
  mutate(join = 1)

set.seed(6789)
pat_id <- seq(1001, 1075)
pat_age <- runif(75, 0, 95)

pat_df <- data.frame(pat_id, pat_age) %>%
  mutate(pat_age_yrs = as.integer(pat_age),
         pat_age_mths = as.integer(pat_age * 12)) %>%
  # Introduce dummy column to obtain a cartesian
  mutate(join = 1) %>%
  # Create a cartesian product with the join column
  inner_join(lkp_df) %>%
  # Filter to keep only required records
  filter(pat_age >= min_age & pat_age < max_age) %>%
  # Keep only necessary columns
  select(pat_id, pat_age, pat_age_yrs, pat_age_mths, age_grp_lb)

有人可以提出更好的方法来处理类似情况。提前致谢。 Masoud A R。

0 个答案:

没有答案