我有两个数据框,
df3有一个时间戳,上面有日期和时间,用户ID和带有多个观察的心率(长数据)
df4有一个时间戳,只有日期,用户ID,卡路里和睡眠(宽数据)
我想将它们组合在一起,以便可以使用基于日期和用户ID的宽数据的数据框填充具有长数据的数据框
以下是具有类似布局的玩具数据集的代码
df3 <- data.frame( time_stamp=c('2016-11-01 10:29:41','2016-11-01 10:53:11','2016-11-02 01:07:54','2016-11-02 02:00:40','2016-11-02 04:01:33','2016-11-02 05:23:53','2016-11-02 13:20:17'),
users_user_id=c(7,7,7,7,7,7,7),
avg_heart_rate=c(94,90,88,85,91,89,95))
df4 <- data.frame( time_stamp=c('2016-11-01','2016-11-02'), users_user_id=c(7,7), calories=c(1800,2000), sleep=c(480,560))
df3$time_stamp <- as.POSIXct(df3$time_stamp)
df4$time_stamp <- as.POSIXct(df4$time_stamp)
我尝试从时间分割时间,但是当我使用时间戳和用户ID在dplyr上执行full_join时,我留下了很多NA。我试着查看如何使用reshape2来融化我的数据?但我迷失了它对我的帮助......
答案 0 :(得分:1)
一种整齐的方式:
library(tidyr)
library(dplyr)
df3 <- separate(df3, time_stamp, into = c("date_stamp", "time_stamp"), sep = " ")
df3$date_stamp <- as.POSIXct(df3$date_stamp)
left_join(df3, df4, by = c("date_stamp" = "time_stamp", "users_user_id"))
date_stamp time_stamp users_user_id avg_heart_rate calories sleep
1 2016-11-01 10:29:41 7 94 1800 480
2 2016-11-01 10:53:11 7 90 1800 480
3 2016-11-02 01:07:54 7 88 2000 560
4 2016-11-02 02:00:40 7 85 2000 560
5 2016-11-02 04:01:33 7 91 2000 560
6 2016-11-02 05:23:53 7 89 2000 560
7 2016-11-02 13:20:17 7 95 2000 560
答案 1 :(得分:0)
您可以创建一个仅包含日期信息的新列,并合并到该列:
df3$date <- as.Date(df3$time_stamp)
df4$date <- as.Date(df4$time_stamp)
merge(df3, df4, by = c("date", "users_user_id"))
给你:
date users_user_id time_stamp.x avg_heart_rate time_stamp.y calories sleep
1 2016-11-01 7 2016-11-01 10:29:41 94 2016-11-01 1800 480
2 2016-11-01 7 2016-11-01 10:53:11 90 2016-11-01 1800 480
3 2016-11-02 7 2016-11-02 01:07:54 88 2016-11-02 2000 560
4 2016-11-02 7 2016-11-02 02:00:40 85 2016-11-02 2000 560
5 2016-11-02 7 2016-11-02 04:01:33 91 2016-11-02 2000 560
6 2016-11-02 7 2016-11-02 05:23:53 89 2016-11-02 2000 560
7 2016-11-02 7 2016-11-02 13:20:17 95 2016-11-02 2000 560