如果这听起来不太清楚,请原谅我,但我会尽力处理这个具有挑战性的问题。 我有多个数据帧。
每个数据框都有hashed_user_id,server_timestap和event。 三个数据帧的示例如下:
Data Frame 1
hashed_user_id server_timestamp event
user1 2017-04-27 15:25:12 AS
user2 2017-04-29 19:34:19 AS
user3 2017-05-01 21:28:17 AS
user4 2017-05-03 23:01:16 AS
Data Frame 2
hashed_user_id server_timestamp event
user1 2017-04-27 16:25:12 AV1
user2 2017-04-29 20:34:19 AV1
user5 2017-05-01 22:19:17 AV1
user6 2017-05-03 14:01:16 AV1
Data Frame 3
hashed_user_id server_timestamp event
user1 2017-04-27 17:25:12 AV2
user2 2017-04-29 15:34:19 AV2
user5 2017-05-01 21:28:17 AV2
user6 2017-05-03 23:01:16 AV2
我希望拥有的等待表应该将所有用户合并到一个表中,并列出由server_timestamp排序的所有事件。因此,预期的新数据框将如下所示:
Expected result:
hashed_user_id sorted_event1 sorted_event2 sorted_event3
user1 AS AV1 AV2
user2 AV2 AS AV1
user3 AS NA NA
user4 AS NA NA
user5 AV2 AV1
user6 AV1 AV2
非常感谢!
答案 0 :(得分:2)
library(tibble)
library(tidyr)
# read your data
dt1 <- tribble(
~hashed_user_id,~server_timestamp, ~event,
"user1", "2017-04-27 15:25:12", "AS",
"user2", "2017-04-29 19:34:19", "AS",
"user3", "2017-05-01 21:28:17", "AS",
"user4", "2017-05-03 23:01:16", "AS"
)
dt2 <- tribble(
~hashed_user_id,~server_timestamp, ~event,
"user1", "2017-04-27 16:25:12", "AV1",
"user2", "2017-04-29 20:34:19", "AV1",
"user5", "2017-05-01 22:28:17", "AV1",
"user6", "2017-05-03 14:01:16", "AV1"
)
dt3 <- tribble(
~hashed_user_id,~server_timestamp, ~event,
"user1", "2017-04-27 17:25:12", "AV2",
"user2", "2017-04-29 15:34:19", "AV2",
"user5", "2017-05-01 21:28:17", "AV2",
"user6", "2017-05-03 23:01:16", "AV2"
)
# solution
dt <- rbind(dt1, dt2, dt3) %>%
mutate(server_timestamp = as.POSIXct(server_timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
group_by(hashed_user_id) %>%
arrange(server_timestamp) %>%
mutate(sorted_event_id = paste0("sorted_event", 1:n())) %>%
select(-server_timestamp) %>%
spread(sorted_event_id, event) %>%
ungroup()
答案 1 :(得分:0)
在某种意义上它并不是一个解决方案,它没有提供您的预期输出,但最好避免将数据排序在NAs
这样的不同列中。
如果您以后仍然必须在R中工作,那么您将有一些肮脏的工作要做。
考虑将您的已排序事件放在向量中,并将其存储在data.frame
/ tibble
中。
首先将这些data.frame放入列表中! :)
res <- list(df1,df2,df3) %>%
bind_rows %>%
arrange(server_timestamp) %>%
select(-server_timestamp) %>%
nest(event,.key="sorted_events")
# A tibble: 6 x 2
# hashed_user_id sorted_events
# <chr> <list>
# 1 user1 <tibble [3 x 1]>
# 2 user2 <tibble [3 x 1]>
# 3 user3 <tibble [1 x 1]>
# 4 user5 <tibble [2 x 1]>
# 5 user6 <tibble [2 x 1]>
# 6 user4 <tibble [1 x 1]>
res$sorted_events[[4]]
# # A tibble: 2 x 1
# event
# <chr>
# 1 AV2
# 2 AV1