将数据帧中的值与R中的3列分开

时间:2018-06-06 09:32:33

标签: r dplyr reshape

我有一个包含大量信息的日志文件。以下是样本:

event_type       | video                    |id
------------------------------------------------ 
load_video"      | Video -math              | 21
load_video"      | Video -math              | 21
load_video"      | Video - Math and Speed   | 22
play_video"      | Video -math              | 21
seek_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
seek_video"      | Video -math              | 21
play_video"      | Video -math              | 21
pause_video"     | Video -math              | 21
play_video"      | Video - Math and Speed   | 22
pause_video"     | Video - Math and Speed   | 22
stop_video"      | Video - Math and Speed   | 22 

我希望用户进行转换以获取此表。

  id    Video -math                             Video - Math and Speed              
     |  load | play   |  seek  |pause|  stop  | load | play   | seek  | pause | stop
 21  |   2   |    2   |   2    |  2  |    0   |  na  |   na   |   na  |  na   |  na
 22  |   na  |    na  |   na   |  na |    na  |   1  |   1    |    0  |   1   |   1

我已经开始使用reshape包了,但我不知道如何将它用于3列。

编辑 - >我不想制作2个标题。我想说明我的目标是什么。

1 个答案:

答案 0 :(得分:1)

tidyrlibrary(tidyr) library(dplyr) library(stringr) dat %>% mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% unite(video_event_type, video, event_type) %>% count(id, video_event_type) %>% spread(video_event_type, n) # # A tibble: 2 x 9 # id `Video - Math and Speed_load` `Video - Math and Speed_pause` `Video - Math and Speed_play` `Video - Math and Speed_stop` `Video -math_load` `Video -math_pause` `Video -math_play` `Video -math_seek` # <int> <int> <int> <int> <int> <int> <int> <int> <int> # 1 21 NA NA NA NA 2 2 2 2 # 2 22 1 1 1 1 NA NA NA NA

complete

编辑:使用dat %>% mutate_at(1, str_extract, "load|play|seek|pause|stop") %>% count(id, video, event_type) %>% complete(nesting(id, video), event_type, fill = list(n = 0L)) %>% unite(video_event_type, video, event_type, sep = ".") %>% spread(video_event_type, n) # # A tibble: 2 x 11 # id `Video - Math and Speed.load` `Video - Math and Speed.pause` `Video - Math and Speed.play` `Video - Math and Speed.seek` `Video - Math and Speed.stop` `Video -math.load` `Video -math.pause` `Video -math.play` `Video -math.seek` `Video -math.stop` # <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> # 1 21 NA NA NA NA NA 2 2 2 2 0 # 2 22 1 1 1 0 1 NA NA NA NA NA 获取预期零的更复杂的解决方案:

dat

(其中dat <- read.table(text = 'event_type | video |id load_video" | Video -math | 21 load_video" | Video -math | 21 load_video" | Video - Math and Speed | 22 play_video" | Video -math | 21 seek_video" | Video -math | 21 pause_video" | Video -math | 21 seek_video" | Video -math | 21 play_video" | Video -math | 21 pause_video" | Video -math | 21 play_video" | Video - Math and Speed | 22 pause_video" | Video - Math and Speed | 22 stop_video" | Video - Math and Speed | 22 ', header = TRUE, sep = "|", quote = "", strip.white = TRUE, stringsAsFactors = FALSE) 是:

grid = []
grid.clear()
f = open('grid.csv','r')
for line in f :
    values = line.split()
    for n in range(0,20) :
        grid.append(values[n])
f.close()
grid