我有一个包含大量信息的日志文件。以下是样本:
event_type | video |id
------------------------------------------------
load_video" | Video -math | 21
load_video" | Video -math | 21
load_video" | Video - Math and Speed | 22
play_video" | Video -math | 21
seek_video" | Video -math | 21
pause_video" | Video -math | 21
seek_video" | Video -math | 21
play_video" | Video -math | 21
pause_video" | Video -math | 21
play_video" | Video - Math and Speed | 22
pause_video" | Video - Math and Speed | 22
stop_video" | Video - Math and Speed | 22
我希望用户进行转换以获取此表。
id Video -math Video - Math and Speed
| load | play | seek |pause| stop | load | play | seek | pause | stop
21 | 2 | 2 | 2 | 2 | 0 | na | na | na | na | na
22 | na | na | na | na | na | 1 | 1 | 0 | 1 | 1
我已经开始使用reshape包了,但我不知道如何将它用于3列。
编辑 - >我不想制作2个标题。我想说明我的目标是什么。
答案 0 :(得分:1)
tidyr
和library(tidyr)
library(dplyr)
library(stringr)
dat %>%
mutate_at(1, str_extract, "load|play|seek|pause|stop") %>%
unite(video_event_type, video, event_type) %>%
count(id, video_event_type) %>%
spread(video_event_type, n)
# # A tibble: 2 x 9
# id `Video - Math and Speed_load` `Video - Math and Speed_pause` `Video - Math and Speed_play` `Video - Math and Speed_stop` `Video -math_load` `Video -math_pause` `Video -math_play` `Video -math_seek`
# <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 21 NA NA NA NA 2 2 2 2
# 2 22 1 1 1 1 NA NA NA NA
:
complete
编辑:使用dat %>%
mutate_at(1, str_extract, "load|play|seek|pause|stop") %>%
count(id, video, event_type) %>%
complete(nesting(id, video), event_type, fill = list(n = 0L)) %>%
unite(video_event_type, video, event_type, sep = ".") %>%
spread(video_event_type, n)
# # A tibble: 2 x 11
# id `Video - Math and Speed.load` `Video - Math and Speed.pause` `Video - Math and Speed.play` `Video - Math and Speed.seek` `Video - Math and Speed.stop` `Video -math.load` `Video -math.pause` `Video -math.play` `Video -math.seek` `Video -math.stop`
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 21 NA NA NA NA NA 2 2 2 2 0
# 2 22 1 1 1 0 1 NA NA NA NA NA
获取预期零的更复杂的解决方案:
dat
(其中dat <- read.table(text =
'event_type | video |id
load_video" | Video -math | 21
load_video" | Video -math | 21
load_video" | Video - Math and Speed | 22
play_video" | Video -math | 21
seek_video" | Video -math | 21
pause_video" | Video -math | 21
seek_video" | Video -math | 21
play_video" | Video -math | 21
pause_video" | Video -math | 21
play_video" | Video - Math and Speed | 22
pause_video" | Video - Math and Speed | 22
stop_video" | Video - Math and Speed | 22
', header = TRUE, sep = "|", quote = "",
strip.white = TRUE, stringsAsFactors = FALSE)
是:
grid = []
grid.clear()
f = open('grid.csv','r')
for line in f :
values = line.split()
for n in range(0,20) :
grid.append(values[n])
f.close()
grid