我已将Wireshark pcap文件导出到csv。我需要根据时间间隔拆分这些csv文件。在csv文件中,有一个“时间”列。我想将这些文件分成1秒的时间间隔。因此,在前1秒到达的前几个数据包将被写入一个文件,在后1秒到达的下一个数据包将被写入另一个文件,依此类推。如果输入文件名为AAA.csv,则拆分文件将获得相同的名称,并在末尾附加一个数字。 AAA1.csv,..... AAA5.csv等。我是编程新手,所以不太确定如何从这一点着手。请帮忙。谢谢https://fil.email/8wSH9ohq
import os
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
for name in files:
if name.endswith(suffix):
filename=os.path.join(root,name)
这是一个csv文件的摘录,其中包含连续2秒钟的行:
"No.","Time","Time delta from previous displayed frame","Length","Source","Destination","Protocol","Info"
"100","23:39:52.634388","0.000502000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"101","23:39:52.634393","0.000005000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=45, FN=0, Flags=.p.....T"
"102","23:39:52.695277","0.060884000","28","Microsof_d2:8b:4f (30:59:b7:d2:8b:4f) (TA)","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","802.11 Block Ack, Flags=........"
"103","23:39:52.695278","0.000001000","10","","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (RA)","802.11","Clear-to-send, Flags=........"
"104","23:39:52.717845","0.022567000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"105","23:39:52.717845","0.000000000","406","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3446, FN=0, Flags=.p....F."
"106","23:39:52.717852","0.000007000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"
"107","23:39:52.717853","0.000001000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"108","23:39:52.719380","0.001527000","28","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","802.11 Block Ack, Flags=........"
"109","23:39:52.719384","0.000004000","102","Htc_9b:92:24","HuaweiTe_3a:d0:16","802.11","QoS Data, SN=46, FN=0, Flags=.p.....T"
"110","23:39:52.719389","0.000005000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Clear-to-send, Flags=........"
"111","23:39:53.109091","0.389702000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4069, FN=0, Flags=...P...T"
"112","23:39:53.109586","0.000495000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"113","23:39:53.149481","0.039895000","28","Sagemcom_28:38:64 (d0:6e:de:28:38:64) (TA)","Microsof_a0:a4:2c (58:82:a8:a0:a4:2c) (RA)","802.11","802.11 Block Ack, Flags=........"
"114","23:39:53.157218","0.007737000","24","Htc_9b:92:24","HuaweiTe_3a:d0:1a","802.11","Null function (No data), SN=4070, FN=0, Flags=.......T"
"115","23:39:53.159251","0.002033000","10","","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Acknowledgement, Flags=........"
"116","23:39:53.159252","0.000001000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"117","23:39:53.159267","0.000015000","10","","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","Clear-to-send, Flags=........"
"118","23:39:53.160276","0.001009000","16","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (TA)","Htc_9b:92:24 (ac:37:43:9b:92:24) (RA)","802.11","Request-to-send, Flags=........"
"119","23:39:53.160277","0.000001000","1500","HuaweiTe_3a:d0:16","Htc_9b:92:24","802.11","QoS Data, SN=3447, FN=0, Flags=.p....F."
"120","23:39:53.160290","0.000013000","28","Htc_9b:92:24 (ac:37:43:9b:92:24) (TA)","HuaweiTe_3a:d0:1a (8c:15:c7:3a:d0:1a) (RA)","802.11","802.11 Block Ack, Flags=........"
答案 0 :(得分:1)
csv模块在这里就足够了。您只需要一次读取每个文件一行。如果“时间”字段的前8个字符(第二个)与上一行相同,则将该行复制到同一输出文件中,否则创建一个新的输出文件。
它可以编码为:
import os
import csv
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
for name in files:
if name.endswith(suffix):
filename=os.path.join(root,name)
with open(filename) as fd: # open the csv file
rd = csv.reader(fd) # as a csv input file
old = None # no previous line
i = 0 # we will start numbering output files with 1
header = next(rd) # store the header line
for row in rd:
if row[1][:8] != old: # we have a different second (or the first one...)
old = row[1][:8] # store current time for next rows
i += 1 # increase output file number
if old is not None: # eventually close previous output file
fdout.close()
fdout = open(filename[:-4] + str(i) + filename[-4:],
'w', newline='') # open a new output file
wr = csv.writer(fdout, quoting=csv.QUOTE_ALL) # with expected csv params
_ = wr.writerow(header) # write the header
_ = wr.writerow(row) # copy the row to the current output file
fdout.close()
以上代码使用的事实是,无需直接在Time字符串中进行解析即可确定秒。如果需要可变的持续时间最终小于秒,则需要解析时间字符串并将其转换为十进制(更精确地是浮点数)秒,然后将其除以以秒为单位的所选持续时间:
...
sec_duration=0.5 # for half a second
...
for row in rd:
# convert the Time field to a total number of seconds in day
# as a flot
cur = datetime.datetime.strptime(row[1], "%H:%M:%S.%f")
cur -= cur.replace(hour=0, minute=0, second=0, microsecond=0)
# make it a number of periods of sec_duration
cur = int(cur.total_seconds() / sec_duration)
if cur != old: # we have a different period (or the first one...)
if old is not None: # eventually close previous output file
fdout.close()
old = cur # store current time for next rows
i += 1 # increase output file number
...
答案 1 :(得分:0)
这应该使您入门。这会将您的示例csv分为11个不同的文件。我建议创建一个测试目录,并使用下面的代码进行测试,如果它符合您的期望。
import os
# pandas to read / write csv and process the data
import pandas as pd
startdir='.'
suffix='.csv'
for root, dirs, files in os.walk(startdir):
for name in files:
if name.endswith(suffix):
filename=os.path.join(root,name)
#print(filename)
df = pd.read_csv(filename)
# Extract the time for grouping
col_time = pd.to_datetime(dat1['Time'])
# Group the values according to second(minute might be not needed)
df2 = df.groupby([col_time.dt.second,col_time.dt.minute])
# now split the data frame according to group and put them in a list
list_of_df = [df2.get_group(x) for x in df2.groups]
# get the data frame from the list and write them
for i in range(len(list_of_df)):
list_of_df[i].to_csv(file_nme[:-4]+str(i)+".csv")