我是编程和正则表达的新手,所以这是我的免责声明。
我正在尝试解析我使用tshark传输到txt文件的wireshark日志。
我的程序的重点是从txt文件的顶部开始,并匹配数据包标头之间的所有文本。
所有数据包都以Frame\s+\d
开头,不包括下一个数据包标头,并将该文本放入字符串中。
我正在实例化一个对象(Packets
),然后将它们添加到ArrayList
以供稍后处理。
我需要将数据包标头1中的所有文本收集到数据包1的末尾/数据包标头2的开头,而不包括数据包标头2.
Frame 1 (186 bytes on wire, 186 bytes captured)
Arrival Time: Sep 19, 2013 13:25:19.937150000
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 186 bytes
Capture Length: 186 bytes
[Frame is marked: False]
[Protocols in frame
............................A bunch of more packet data...............
Encrypted Packet: 88FE0AFA38B3E1994B907F778FC42CD4FBD967F3D9101679...
Frame 2 (60 bytes on wire, 60 bytes captured)
Arrival Time: Sep 19, 2013 13:25:19.938495000
[Time delta from previous captured frame: 0.001345000 seconds]
[Time delta from previous displayed frame: 0.001345000 seconds]
我试过了:
(Frame\s\d)*.?Frame\s\d
但不是骰子。
我一直在关注rubular.com,看看我是否可以点击这个,但我似乎无法满足我的需求。
思想?
答案 0 :(得分:0)
考虑packets.txt
中的文件/your/path
,其中包含您发布的示例...
这是一个解决方案。
try {
// trivial file operations
String path = "/your/path/packets.txt";
File file = new File(path);
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
String line = null;
StringBuilder contents = new StringBuilder();
while ((line = br.readLine()) != null) {
contents.append(line);
}
br.close();
// the Pattern
Pattern p = Pattern.compile("Frame\\s\\d\\s(.+?(?=Frame|$))", Pattern.MULTILINE);
// If you actually need the "Frame etc." header matched as well, here's
// an alternate Pattern:
// Pattern p = Pattern.compile("(Frame\\s\\d\\s.+?(?=Frame|$))", Pattern.MULTILINE);
// matching...
Matcher m = p.matcher(contents);
// iterating over matches and printing out group 1
while (m.find()) {
System.out.println("Found: " + m.group(1));
}
}
// "handling" FileNotFoundException
catch (Throwable t) {
t.printStackTrace();
}
输出:
Found: (186 bytes on wire, 186 bytes captured) Arrival Time: Sep 19, 2013 13:25:19.937150000 [Time delta from previous captured frame: 0.000000000 seconds] [Time delta from previous displayed frame: 0.000000000 seconds] [Time since reference or first frame: 0.000000000 seconds]
Found: (60 bytes on wire, 60 bytes captured) Arrival Time: Sep 19, 2013 13:25:19.938495000 [Time delta from previous captured frame: 0.001345000 seconds] [Time delta from previous displayed frame: 0.001345000 seconds]
Pattern
的解释:
编辑:提示性能和内存优化
小步但很明显:将Pattern
声明为常量,因此只编译一次
不是填充每个匹配都会增长的ArrayList
,而是将每个匹配写入某个文件夹中的单个文件 - 这将执行得很慢,但如果实施得当,应该允许进行垃圾回收对于String
循环
while (m.find())
迭代结束后,您将不得不再次迭代处理每个小文件
如果这还不够或只是对数据的大小不起作用,您可能希望实现自己的自定义解析器,或以某种方式预先分块数据,但这远远超出范围,考虑到您的原始问题是关于Pattern
本身,而不是性能