我正在尝试使用公共列数据合并两个连续的数据行。本质上,我试图从
开始 UserID Geography Login Logout
user1 East 0:00:22 -
user1 East - 0:01:29
user2 West 0:03:57 -
user2 West - 0:48:10
user3 South 0:59:25 -
user3 South - 1:08:21
到
UserID Geography Login Logout
user1 East 0:00:22 0:01:29
user2 West 0:03:57 0:48:10
user3 South 0:59:25 1:08:21
我提前为格式化道歉。我想提一下,像这样的多行包含user1,user2等数据,因此MAX或MIN等聚合函数不起作用。我正在寻找的解决方案是R,但任何其他语言也是最受欢迎的。
提前致谢, 戈帕尔
答案 0 :(得分:1)
这可以通过 dplyr 和 tidyr 包来完成。实质上,我们将登录和注销时间收集到一个列中,删除空值,并将登录和注销事件重新传播到它们自己的列中。
df1 <- read.table(text = 'UserID Geography Login Logout
user1 East 0:00:22 -
user1 East - 0:01:29
user2 West 0:03:57 -
user2 West - 0:48:10
user3 South 0:59:25 -
user3 South - 1:08:21', header = T)
UserID Geography Login Logout
1 user1 East 0:00:22 -
2 user1 East - 0:01:29
3 user2 West 0:03:57 -
4 user2 West - 0:48:10
5 user3 South 0:59:25 -
6 user3 South - 1:08:21
library(dplyr)
library(tidyr)
df2 <- df1 %>%
gather(action, time, -UserID, -Geography) %>%
filter(time != '-') %>%
spread(action, time)
UserID Geography Login Logout
1 user1 East 0:00:22 0:01:29
2 user2 West 0:03:57 0:48:10
3 user3 South 0:59:25 1:08:21
在OP的原始数据集中,每个用户可以进行多次登录:
df <- read.table(text = 'UserID Geography EventType ChannelType Time
user4 South Log-in Web 0:00:10
user1 East Log-in Web 0:00:22
user4 South Log-out Mobile 0:00:44
user1 East Log-out Web 0:01:29
user5 East Log-in Web 0:02:01
user1 East Log-in Mobile 0:03:57
user16 South Log-in Mobile 0:04:36
user15 North Log-in Mobile 0:05:42
user3 North Log-in Web 0:05:59
user8 South Log-in Mobile 0:07:09
user19 North Log-in Mobile 0:09:22
user11 North Log-in Web 0:12:39
user8 South Log-out Web 0:18:32
user8 South Log-in Web 0:19:35', header = T, stringsAsFactors = F)
关键是使用 dplyr 对每个用户进行登录和注销,然后对这些进行编号。现在每个登录/注销配对都是唯一标识的,数据可以重新格式化:
df2 <- df %>%
arrange(UserID, Time) %>%
group_by(UserID, EventType) %>%
mutate(EventNum = 1:n()) %>%
select(-ChannelType) %>%
spread(EventType, Time, fill = '-') %>%
arrange(`Log-in`)
UserID Geography EventNum `Log-in` `Log-out`
<chr> <chr> <int> <chr> <chr>
1 user4 South 1 0:00:10 0:00:44
2 user1 East 1 0:00:22 0:01:29
3 user5 East 1 0:02:01 -
4 user1 East 2 0:03:57 -
5 user16 South 1 0:04:36 -
6 user15 North 1 0:05:42 -
7 user3 North 1 0:05:59 -
8 user8 South 1 0:07:09 0:18:32
9 user19 North 1 0:09:22 -
10 user11 North 1 0:12:39 -
11 user8 South 2 0:19:35 -