hadoop中的数据包计数(使用Mapreduce)

时间:2015-03-30 09:02:24

标签: hadoop mapreduce packet-capture snort hping

事情已经完成:


从以下链接安装Hadoop:

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_4_4.html


安装Hping3以使用以下方式生成洪水请求:

sudo hping3 -c 10000 -d 120 -S -w 64 -p 8000 --flood --rand-source 192.168.1.12

安装snort以使用以下命令记录上述请求:

sudo snort -ved -h 192.168.1.0/24 -l .

这将生成日志文件snort.log.1427021231

我可以用

阅读
sudo snort -r snort.log.1427021231

给出表格的输出:

= + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = +

03 / 22-16:17:14.259633 192.168.1.12:8000 - > 117.247.194.105:46639 TCP TTL:64 TOS:0x0 ID:0 IpLen:20 DgmLen:44 DF A S Seq:0x6EEE4A6B Ack:0x6DF6015B Win:0x7210 TcpLen:24 TCP选项(1)=> MSS:1460 = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = +


我用过

hdfs dfs -put <localsrc> ... <dst>

将此日志文件复制到HDFS。

现在,Thnigs我想要帮助:

如何计算日志文件中源IP地址,目标IP地址,端口地址,协议,时间戳的总数。

(我是否必须编写自己的Map reduce程序?或者有一个库。)


我也找到了

https://github.com/ssallys/p3

但无法让它运行。查看了JAR文件的内容但无法运行它。

ratan@lenovo:~/Desktop$ hadoop jar ./p3lite.jar p3.pcap.examples.PacketCount

Exception in thread "main" java.lang.ClassNotFoundException:        nflow.runner.Runner
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

感谢。

1 个答案:

答案 0 :(得分:1)

快速搜索后,您可能需要自定义MapReduce作业。

该算法看起来类似于以下伪代码:

Parse the file line by line (or parse every n lines if logs are more than one line long).

in the mapper, use regex to figure out if something is a source IP, destination IP etc.

output these with key value structure of <Type, count> 
    type is the type of text that was matched (ex. source IP)
    count is the number of times it was matched in the record

have reducer sum all of the values from the mappers, and get global totals for each type of information you want

write to file in desired format.