我在WireShark的网络连接两侧都有一个组合数据包捕获。捕获导出为CSV文件,每行包含唯一ID和时间戳。因为我从双方捕获这意味着我将有两行每个ID包含发送时间戳和接收时间戳。我想要做的是通过减去这些值来计算延迟。我已经设法做到了但是我需要大约12秒来查看我的17000个数据包列表,我总共有15个列表,等于3分钟执行时间,使用以下代码:
data <- read.csv("normal-novpn.csv", sep=",", numerals="no.loss", header=TRUE)
ID = data.matrix(data[,7], rownames.force = NA)
time = data.matrix(as.double(as.character(data[,2])), rownames.force = NA)
time = time*1000000 # Time is now in microseconds
len <- nrow(ID)
mat <- matrix(,nrow=len,ncol=2)
for(i in 1:len){
d <- unlist(strsplit(ID[i], " "))
mat[i,1] <- as.numeric(gsub('[()]','',d[2]))
mat[i,2] <- time[i]
}
delay = vector(length=len/2)
k <- 1
for(i in 1:len){
for(j in i:len){
if(mat[i,1] == mat[j,1] && mat[j,2] > mat[i,2]){
delay[k] <- mat[j,2] - mat[i,2]
k <- k+1
}
}
}
CSV文件中的行按时间排序,行如下所示:
"32","1505997726.015245358","10.0.10.70","10.0.10.1","UDP","214","0xa5f0 (42480)","50414 > 5201 Len=172"
其中时间戳为:&#34; 1505997726.015245358&#34; ID为:&#34; 0xa5f0(42480)&#34;
我的问题是,如果我能更有效地做到这一点,以减少执行时间。
更新: 这是指向我的一个包含17000行的CSV文件的链接:https://justpaste.it/1bjoy
这是一个只有10行数据+标题的小文件。有一点需要提及的是,对于所有文件而言,重复ID在列表中彼此相邻是不正确的。
"No.","Time","Source","Destination","Protocol","Length","Identification","Info"
"120","1505984967.366049706","10.0.0.50","10.0.0.35","UDP","214","0x8dab (36267)","46670 > 5201 Len=172"
"123","1505984967.366440","10.0.0.50","10.0.0.35","UDP","214","0x8dab (36267)","46670 > 5201 Len=172"
"124","1505984967.386478504","10.0.0.50","10.0.0.35","UDP","214","0x8dac (36268)","46670 > 5201 Len=172"
"125","1505984967.386606","10.0.0.50","10.0.0.35","UDP","214","0x8dac (36268)","46670 > 5201 Len=172"
"130","1505984967.406353133","10.0.0.50","10.0.0.35","UDP","214","0x8db0 (36272)","46670 > 5201 Len=172"
"131","1505984967.406555","10.0.0.50","10.0.0.35","UDP","214","0x8db0 (36272)","46670 > 5201 Len=172"
"132","1505984967.426372842","10.0.0.50","10.0.0.35","UDP","214","0x8db1 (36273)","46670 > 5201 Len=172"
"133","1505984967.426558","10.0.0.50","10.0.0.35","UDP","214","0x8db1 (36273)","46670 > 5201 Len=172"
"134","1505984967.446282356","10.0.0.50","10.0.0.35","UDP","214","0x8db6 (36278)","46670 > 5201 Len=172"
"135","1505984967.446555","10.0.0.50","10.0.0.35","UDP","214","0x8db6 (36278)","46670 > 5201 Len=172"
更新2: 必须保留行的顺序,因为我将执行新值的其他计算。第一栏&#34; No。&#34;表示WireShark计算的数据包编号,并且在遍历列表时必须增加。
答案 0 :(得分:0)
以下是使用data.table
的快速解决方案。文件so_long.csv
为this one fromn your edit。
library(data.table)
library(microbenchmark)
foo <- function() {
dt <- fread("so_long.csv")
dt[, Time := as.double(as.character(Time)) * 1000000]
dt[, .(Delay = max(Time) - min(Time)), by = Identification]
}
head(foo())
# Identification Delay
# 1: 0x0003 (3) 1749.75
# 2: 0x0004 (4) 1761.00
# 3: 0x0007 (7) 1887.50
# 4: 0x0009 (9) 1983.75
# 5: 0x000e (14) 1929.75
# 6: 0x0014 (20) 1948.50
microbenchmark(foo())
# Unit: milliseconds
# expr min lq mean median uq max neval
# foo() 38.28835 52.17356 64.48024 60.63322 72.21627 132.8679 100