Question

我有一个包含大量信息的日志文件。以下是样本：

event_type       | video                    |id
------------------------------------------------ 
load_video"      | Video -math              | 21
load_video"      | Video -math              | 21
load_video"      | Video - Math and Speed   | 22
play_video"      | Video -math              | 21
seek_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
seek_video"      | Video -math              | 21
play_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
play_video"      | Video - Math and Speed   | 22
pause_video"     | Video - Math and Speed   | 22
stop_video"      | Video - Math and Speed   | 22

我希望用户进行转换以获取此表。

  id    Video -math                             Video - Math and Speed              
     |  load | play   |  seek  |pause|  stop  | load | play   | seek  | pause | stop
 21  |   2   |    2   |   2    |  2  |    0   |  na  |   na   |   na  |  na   |  na
 22  |   na  |    na  |   na   |  na |    na  |   1  |   1    |    0  |   1   |   1

我已经开始使用reshape包了，但我不知道如何将它用于3列。

编辑 - ＆gt;我不想制作2个标题。我想说明我的目标是什么。

Answer 1

tidyr和library(tidyr) library(dplyr) library(stringr) dat %>% mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% unite(video_event_type, video, event_type) %>% count(id, video_event_type) %>% spread(video_event_type, n) # # A tibble: 2 x 9 # id `Video - Math and Speed_load` `Video - Math and Speed_pause` `Video - Math and Speed_play` `Video - Math and Speed_stop` `Video -math_load` `Video -math_pause` `Video -math_play` `Video -math_seek` # <int> <int> <int> <int> <int> <int> <int> <int> <int> # 1 21 NA NA NA NA 2 2 2 2 # 2 22 1 1 1 1 NA NA NA NA：

complete

编辑：使用dat %>% mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% count(id, video, event_type) %>% complete(nesting(id, video), event_type, fill = list(n = 0L)) %>% unite(video_event_type, video, event_type, sep = ".") %>% spread(video_event_type, n) # # A tibble: 2 x 11 # id `Video - Math and Speed.load` `Video - Math and Speed.pause` `Video - Math and Speed.play` `Video - Math and Speed.seek` `Video - Math and Speed.stop` `Video -math.load` `Video -math.pause` `Video -math.play` `Video -math.seek` `Video -math.stop` # <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> # 1 21 NA NA NA NA NA 2 2 2 2 0 # 2 22 1 1 1 0 1 NA NA NA NA NA获取预期零的更复杂的解决方案：

dat

grid = []
grid.clear()
f = open('grid.csv','r')
for line in f :
    values = line.split()
    for n in range(0,20) :
        grid.append(values[n])
f.close()
grid

将数据帧中的值与R中的3列分开

1 个答案: