使用一个数据集中的数据从另一数据集中提取信息

时间:2019-04-03 21:42:35

标签: r dplyr tidyverse

我有两个数据集。每个人都有关于健康的信息。另一个具有关于MRI日期之前和之后的信息。我正在尝试根据这些前/后日期提取健康信息。

MRI Pre / Post数据集:

ID  prescan PreDate Postscan    PostDate
5006    1   5/10/2018   1   6/14/2018
5007    1   5/15/2018   1   6/13/2018
5009    1   5/9/2018    1   6/11/2018
5011    1   5/31/2018   1   7/2/2018
5013    1   5/30/2018   1   7/5/2018

睡眠数据样本:

SubID   SleepDate   Day of Week RHR HRV Recovery
5007    5/12/2018   'Saturday ' 63  95  65
5007    5/13/2018   'Sunday   ' 66  72  52
5010    5/7/2018    'Monday   ' 74  40  48
5010    5/8/2018    'Tuesday  ' 68  67  59
5010    5/9/2018    'Wednesday' 75  74  82
5010    5/10/2018   'Thursday ' 71  80  89
5010    5/11/2018   'Friday   ' 71  91  95
5010    5/12/2018   'Saturday ' 68  66  58
5008    5/7/2018    'Monday   ' 60  132 85
5008    5/8/2018    'Tuesday  ' 60  123 90
5008    5/9/2018    'Wednesday' 66  105 68
5009    5/7/2018    'Monday   ' 47  148 90
5009    5/8/2018    'Tuesday  ' 45  169 87
5009    5/9/2018    'Wednesday' 46  176 75
5009    5/10/2018   'Thursday ' 50  138 54
5009    5/11/2018   'Friday   ' 46  132 42
5009    5/12/2018   'Saturday ' 47  158 60
5009    5/13/2018   'Sunday   ' 47  141 54
5006    5/7/2018    'Monday   ' 56  92  65

我尝试过的事情(及其变化)

SleepData %>%
  subset(SubID == 5006) %>% 
  filter(SleepDate %in% MRI_date$PreDate)

上面经常返回所有5006 ID数据

SleepData %>%
  subset(SubID == 5006) %>% 
  subset(SleepDate == MRI_date$PreDate)

哪个返回:

longer object length is not a multiple of shorter object lengthLength of logical index must be 1 or 31, not 44Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 1

我要提取的内容

基于此,例如:

If ID == 5009 & (Date == 5/9/2018 & 6/11/2018)

我想相应地接收睡眠数据:

SubID   SleepDate   Day of Week RHR HRV Recovery
5009    5/9/2018    'Wednesday' 46  176 75
5009    6/11/2018   'Wednesday' 76  196 95

[我整理了6/11/2018供参考]

3 个答案:

答案 0 :(得分:0)

尝试这样的事情。

library(dplyr)

sleep.dat %>%
 inner_join(mri.dat, by = c("Id" = "subId") %>%
 select(Id == "5009") %>%
 mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
 filter(Date >= as.Date("5-9-2018") & Date <= as.Date("6-11-2018")) %>%
 select(Id, SleepDate, `Day of Week`, RHR, HRV, Recovery)

答案 1 :(得分:0)

如果要获取每个PreDate和每个PostDate的睡眠数据,则将这两个日期收集到一列中并跟踪另一列中的日期类型将使事情变得更加简单。然后,您可以进行联接以提取与该ID和该日期匹配的所有睡眠数据。

library(tidyverse)
MRI_date %>%
  gather(status, SleepDate, c(PreDate, PostDate)) %>%
  left_join(SleepData, by = c("ID" = "SubID", "SleepDate"))

#Joining, by = "SleepDate"
#     ID prescan Postscan   status  SleepDate Day_of_Week RHR HRV Recovery
#1  5006       1        1  PreDate 2018-05-10        <NA>  NA  NA       NA
#2  5007       1        1  PreDate 2018-05-15        <NA>  NA  NA       NA
#3  5009       1        1  PreDate 2018-05-09   Wednesday  46 176       75
#4  5011       1        1  PreDate 2018-05-31        <NA>  NA  NA       NA
#5  5013       1        1  PreDate 2018-05-30        <NA>  NA  NA       NA
#6  5006       1        1 PostDate 2018-06-14        <NA>  NA  NA       NA
#7  5007       1        1 PostDate 2018-06-13        <NA>  NA  NA       NA
#8  5009       1        1 PostDate 2018-06-11        <NA>  NA  NA       NA
#9  5011       1        1 PostDate 2018-07-02        <NA>  NA  NA       NA
#10 5013       1        1 PostDate 2018-07-05        <NA>  NA  NA       NA

源数据:

library(dplyr)
MRI_date <- read.table(
  header = T, 
  stringsAsFactors = F, colClasses = c("integer", "integer", "character", "integer", "character"),
  text = "ID prescan PreDate Postscan    PostDate
5006    1   5/10/2018   1   6/14/2018
5007    1   5/15/2018   1   6/13/2018
5009    1   5/9/2018    1   6/11/2018
5011    1   5/31/2018   1   7/2/2018
5013    1   5/30/2018   1   7/5/2018") %>%
  mutate_if(is.character, lubridate::mdy)


SleepData <- df <- read.table(
  header = T, 
  stringsAsFactors = F, 
  text = "SubID   SleepDate   Day_of_Week RHR HRV Recovery
5007    5/12/2018   'Saturday ' 63  95  65
5007    5/13/2018   'Sunday   ' 66  72  52
5010    5/7/2018    'Monday   ' 74  40  48
5010    5/8/2018    'Tuesday  ' 68  67  59
5010    5/9/2018    'Wednesday' 75  74  82
5010    5/10/2018   'Thursday ' 71  80  89
5010    5/11/2018   'Friday   ' 71  91  95
5010    5/12/2018   'Saturday ' 68  66  58
5008    5/7/2018    'Monday   ' 60  132 85
5008    5/8/2018    'Tuesday  ' 60  123 90
5008    5/9/2018    'Wednesday' 66  105 68
5009    5/7/2018    'Monday   ' 47  148 90
5009    5/8/2018    'Tuesday  ' 45  169 87
5009    5/9/2018    'Wednesday' 46  176 75
5009    5/10/2018   'Thursday ' 50  138 54
5009    5/11/2018   'Friday   ' 46  132 42
5009    5/12/2018   'Saturday ' 47  158 60
5009    5/13/2018   'Sunday   ' 47  141 54
5006    5/7/2018    'Monday   ' 56  92  65")
SleepData <- SleepData %>% mutate(SleepDate = lubridate::mdy(SleepDate))

答案 2 :(得分:0)

您可以使用合并功能:

    pre <- subset(merge(SleepData, MRI_date, by.x = c("SubID", "SleepDate"), by.y = c("ID", "PreDate")), TRUE, select = c(SubID:Recovery))
    post <- subset(merge(SleepData, MRI_date, by.x = c("SubID", "SleepDate"), by.y = c("ID", "PostDate")), TRUE, select = (SubID:Recovery))
    result <- rbind(pre, post)

子集函数仅用于在带有select =参数的合并函数之后从SleepData中选择相关列-它不会删除任何行。这样可以确保rbind仅获得两个数据帧,这些数据帧具有与参数相同的列