y<-readLines("output.txt")
在读取txt文件后,我需要将此数据格式化为具有一定数量列的数据帧。需要摆脱没有21列的字母和行。我正在做以下解析 - 以及任何信件。
p<-gsub("-","",p)
p<-gsub("[aA-zZ]","",p)
系统配置:lcpu = 96 mem = 196608MB ent = 16.00
kthr memory page faults cpu time ----------- --------------------- ------------------------------------ ------------------ ----------------------- -------- r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se 19 0 0 21337487 7123470 0 201 0 0 0 0 3576 66723 30304 19 4 77 0 5.97 37.3 00:02:30 27 0 0 21337431 7121069 0 123 0 0 0 0 4298 81526 36157 19 4 78 0 5.61 35.1 00:03:00 18 0 0 21333631 7122351 0 195 0 0 0 0 3696 65163 30794 23 4 74 0 6.49 40.6 00:03:30 19 0 0 21333590 7119082 0 194 0 0 0 0 5217 102823 47621 27 5 68 0 7.79 48.7 00:04:00 kthr memory page faults cpu time ----------- --------------------- ------------------------------------ ------------------ ----------------------- -------- r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se 20 0 0 21347610 7204383 0 167 0 0 0 0 3645 73642 33333 21 3 75 0 6.21 38.8 00:12:30 16 0 0 21347576 7201448 0 110 0 0 0 0 4882 84287 40503 23 4 73 0 6.77 42.3 00:13:00
一旦我解析了不需要的字符,我就会有一些空行。这还不是数据框,我怎么能在这里摆脱空行?
答案 0 :(得分:3)
您可以使用readLines
和count.fields
完成此操作。
# path is the path to your data file
read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 21])
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
# 1 19 0 0 21337487 7123470 0 201 0 0 0 0 3576 66723 30304 19 4 77 0 5.97 37.3 00:02:30
# 2 27 0 0 21337431 7121069 0 123 0 0 0 0 4298 81526 36157 19 4 78 0 5.61 35.1 00:03:00
# 3 18 0 0 21333631 7122351 0 195 0 0 0 0 3696 65163 30794 23 4 74 0 6.49 40.6 00:03:30
# 4 19 0 0 21333590 7119082 0 194 0 0 0 0 5217 102823 47621 27 5 68 0 7.79 48.7 00:04:00
# 5 20 0 0 21347610 7204383 0 167 0 0 0 0 3645 73642 33333 21 3 75 0 6.21 38.8 00:12:30
# 6 16 0 0 21347576 7201448 0 110 0 0 0 0 4882 84287 40503 23 4 73 0 6.77 42.3 00:13:00
答案 1 :(得分:1)
正则表达式可以帮助
### For each row in your object "text", search for lines where...
# we start at the beginning of the line, search for a blank repeated
# any number of times, then we get to the end of the line
index <- grep('^[[:blank:]]$', text)
### Now that we know which rows contain only blanks, we know which rows to remove
text <- text[-index]
答案 2 :(得分:0)
dat <- readLines(textConnection('
kthr memory page faults cpu time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se
19 0 0 21337487 7123470 0 201 0 0 0 0 3576 66723 30304 19 4 77 0 5.97 37.3 00:02:30
27 0 0 21337431 7121069 0 123 0 0 0 0 4298 81526 36157 19 4 78 0 5.61 35.1 00:03:00
18 0 0 21333631 7122351 0 195 0 0 0 0 3696 65163 30794 23 4 74 0 6.49 40.6 00:03:30
19 0 0 21333590 7119082 0 194 0 0 0 0 5217 102823 47621 27 5 68 0 7.79 48.7 00:04:00
kthr memory page faults cpu time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se
20 0 0 21347610 7204383 0 167 0 0 0 0 3645 73642 33333 21 3 75 0 6.21 38.8 00:12:30
16 0 0 21347576 7201448 0 110 0 0 0 0 4882 84287 40503 23 4 73 0 6.77 42.3 00:13:00'))
dat <- gsub('-','',dat)
dat <- gsub('[ ]{1,}','|',dat)
dat <- strsplit(dat,split='\\|')
dat[lapply(dat,length)==24]
col.names <- dat[lapply(dat,length)==24][[1]]
dat <- do.call(rbind,dat[lapply(dat,length)==22])
你得到这个data.frame:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,] "" "19" "0" "0" "21337487" "7123470" "0" "201" "0" "0" "0" "0" "3576" "66723" "30304" "19" "4" "77" "0" "5.97" "37.3"
[2,] "" "27" "0" "0" "21337431" "7121069" "0" "123" "0" "0" "0" "0" "4298" "81526" "36157" "19" "4" "78" "0" "5.61" "35.1"
[3,] "" "18" "0" "0" "21333631" "7122351" "0" "195" "0" "0" "0" "0" "3696" "65163" "30794" "23" "4" "74" "0" "6.49" "40.6"
[4,] "" "19" "0" "0" "21333590" "7119082" "0" "194" "0" "0" "0" "0" "5217" "102823" "47621" "27" "5" "68" "0" "7.79" "48.7"
[5,] "" "20" "0" "0" "21347610" "7204383" "0" "167" "0" "0" "0" "0" "3645" "73642" "33333" "21" "3" "75" "0" "6.21" "38.8"
[6,] "" "16" "0" "0" "21347576" "7201448" "0" "110" "0" "0" "0" "0" "4882" "84287" "40503" "23" "4" "73" "0" "6.77" "42.3"
[,22]
[1,] "00:02:30"
[2,] "00:03:00"
[3,] "00:03:30"
[4,] "00:04:00"
[5,] "00:12:30"
[6,] "00:13:00"
我认为您仍然需要将数据转换为数字...