我正在尝试逐行读取NASA数据的日志文件,然后分成5列。但是现在似乎没有正确分裂,另一个问题是没有常见的分裂字符。
fileName <- 'C:/Users/xxxxx/Desktop/access_log_Jul95.txt'
fileConn<-file('C:/Users/xxxxx/Desktop/output.txt')
conn <- file(fileName,open="r")
linn <-readLines(conn)
fo00:00:12 -0400] "GET /shuttle/countdown/video/livevideo.gif HTTP/1.0" 200 0
这是我想要的输出:
199.72.81.55, [01/Jul/1995:00:00:01 -0400], GET, /history/apollo/ HTTP/1.0, 200, 6245
答案 0 :(得分:2)
不像@Psidom解决方案那么优雅,但这可以完成工作:
library(stringr)
library(dplyr)
df <- str_split(linn, " ") %>%
do.call(rbind, .) %>%
as.data.frame() %>%
mutate(V6 = str_replace(df$V6, '"', ""),
V8 = str_replace(df$V8, '"', ""),
a = paste(V4, V5),
b = paste0(V7, V8)) %>%
select(c(1, 11, 6, 12, 9, 10))
# Clean up the column names
names(df) <- paste0("V", seq_along(1:ncol(df)))
输出:
V1 V2 V3 V4 V5 V6
1 199.72.81.55 [01/Jul/1995:00:00:01 -0400] GET /history/apollo/HTTP/1.0 200 6245
2 unicomp6.unicomp.net [01/Jul/1995:00:00:06 -0400] GET /shuttle/countdown/HTTP/1.0 200 3985
3 199.120.110.21 [01/Jul/1995:00:00:09 -0400] GET /shuttle/missions/sts-73/mission-sts-73.htmlHTTP/1.0 200 4085
4 burger.letters.com [01/Jul/1995:00:00:11 -0400] GET /shuttle/countdown/liftoff.htmlHTTP/1.0 304 0
5 199.120.110.21 [01/Jul/1995:00:00:11 -0400] GET /shuttle/missions/sts-73/sts-73-patch-small.gifHTTP/1.0 200 4179
6 burger.letters.com [01/Jul/1995:00:00:12 -0400] GET /images/NASA-logosmall.gifHTTP/1.0 304 0
7 burger.letters.com [01/Jul/1995:00:00:12 -0400] GET /shuttle/countdown/video/livevideo.gifHTTP/1.0 200 0
答案 1 :(得分:1)
尝试使用此正则表达式( - - |(?<=]) |(?<=\\") |(?<=\\d) (?=\\d))
进行拆分:
lines <- readLines(conn)
do.call(rbind,
lapply(lines, function(line) strsplit(line, '( - - |(?<=]) |(?<=\\") |(?<=\\d) (?=\\d))', perl = T)[[1]]))
# [,1] [,2] [,3] [,4] [,5]
# [1,] "199.72.81.55" "[01/Jul/1995:00:00:01 -0400]" "\"GET /history/apollo/ HTTP/1.0\"" "200" "6245"
# [2,] "unicomp6.unicomp.net" "[01/Jul/1995:00:00:06 -0400]" "\"GET /shuttle/countdown/ HTTP/1.0\"" "200" "3985"
# [3,] "199.120.110.21" "[01/Jul/1995:00:00:09 -0400]" "\"GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0\"" "200" "4085"
# [4,] "burger.letters.com" "[01/Jul/1995:00:00:11 -0400]" "\"GET /shuttle/countdown/liftoff.html HTTP/1.0\"" "304" "0"
# [5,] "199.120.110.21" "[01/Jul/1995:00:00:11 -0400]" "\"GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0\"" "200" "4179"
# [6,] "burger.letters.com" "[01/Jul/1995:00:00:12 -0400]" "\"GET /images/NASA-logosmall.gif HTTP/1.0\"" "304" "0"
# [7,] "burger.letters.com" "[01/Jul/1995:00:00:12 -0400]" "\"GET /shuttle/countdown/video/livevideo.gif HTTP/1.0\"" "200" "0"