IP读取和一些缺失值

时间:2013-01-05 00:12:26

标签: r time ip

我是R的新手。这次我真的需要读取包括时间,ip等数据的数据:

18:00:04.940864 129.63.50.235.53 > 129.63.71.70.1111:  udp 107
18:00:04.957456 129.63.80.240.161 > 129.63.152.10.39518:  udp 151
18:00:04.958432 129.63.152.10.39518 > 129.63.80.240.161:  udp 136 (DF)
18:00:04.963312 217.79.96.182.53 > 129.63.1.1.1564:  udp 48 (DF)
18:00:05.000976 129.63.50.235.1028 > 218.232.110.133.53:  udp 34
18:00:05.207888 129.63.50.235.1028 > 203.50.0.24.53:  udp 32

我从

开始
read.table(file='sample.txt',head=F,'%H:%M:%S',sep='')

比我被困在那一点因为几乎没有类型的分离:空间,'>'和':' 最后是那里可能有或没有(DF)的最后一个向量。

有人能给我一个解决这类数据的想法吗?非常感谢

1 个答案:

答案 0 :(得分:0)

这是一种蛮力的方法。

tt <- read.table(header=FALSE, fill=TRUE, stringsAsFactors=FALSE,
text="18:00:04.940864 129.63.50.235.53 > 129.63.71.70.1111:  udp 107
18:00:04.957456 129.63.80.240.161 > 129.63.152.10.39518:  udp 151
18:00:04.958432 129.63.152.10.39518 > 129.63.80.240.161:  udp 136 (DF)
18:00:04.963312 217.79.96.182.53 > 129.63.1.1.1564:  udp 48 (DF)
18:00:05.000976 129.63.50.235.1028 > 218.232.110.133.53:  udp 34
18:00:05.207888 129.63.50.235.1028 > 203.50.0.24.53:  udp 32")

last <- apply(tt[-(1:4)], 1, paste, collapse=' ')
tt[,5] <- last
tt[,4] <- sub(':', '', tt[,4])
tt <- tt[c(1,2,4,5)]

> tt
##               V1                  V2                  V4           V5
## 1 18:00:04.940864    129.63.50.235.53   129.63.71.70.1111     udp 107 
## 2 18:00:04.957456   129.63.80.240.161 129.63.152.10.39518     udp 151 
## 3 18:00:04.958432 129.63.152.10.39518   129.63.80.240.161 udp 136 (DF)
## 4 18:00:04.963312    217.79.96.182.53     129.63.1.1.1564 udp  48 (DF)
## 5 18:00:05.000976  129.63.50.235.1028  218.232.110.133.53     udp  34 
## 6 18:00:05.207888  129.63.50.235.1028      203.50.0.24.53     udp  32