在下面找到我正在处理的数据:
Source_IP Tag_Name Time
0.0.0.0 Smurf_Attack Fri Sep 06 00:21:22 PDT 2013
0.0.0.0 Smurf_Attack Fri Sep 06 00:21:23 PDT 2013
0.0.0.0 Smurf_Attack Fri Sep 06 00:23:00 PDT 2013
0.0.0.0 Smurf_Attack Fri Sep 06 00:23:00 PDT 2013
0.0.0.0 Smurf_Attack Fri Sep 06 00:44:49 PDT 2013
109.104.76.38 HTTP_AuthResponse_Possible_CSRF Tue Sep 03 00:50:31 PDT 2013
109.104.76.38 HTTP_AuthResponse_Possible_CSRF Tue Sep 03 06:38:58 PDT 2013
109.109.59.237 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 06:38:33 PDT 2013
109.126.228.242 Conficker_P2P_Detected Fri Sep 06 14:12:02 PDT 2013
109.126.228.242 Conficker_P2P_Data_Transfer Fri Sep 06 14:12:02 PDT 2013
109.126.228.242 Conficker_P2P_Protection Fri Sep 06 14:12:02 PDT 2013
109.148.240.237 HTTPS_Apache_ClearText_DoS Fri Sep 06 03:33:48 PDT 2013
109.185.22.245 Smurf_Attack Fri Sep 06 11:49:21 PDT 2013
109.201.23.98 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 08:57:26 PDT 2013
109.201.23.98 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 08:57:29 PDT 2013
109.230.128.210 Conficker_P2P_Detected Wed Sep 04 04:24:51 PDT 2013
109.230.128.210 Conficker_P2P_Data_Transfer Wed Sep 04 04:24:51 PDT 2013
109.230.128.210 Conficker_P2P_Protection Wed Sep 04 04:24:51 PDT 2013
109.232.172.122 HTTP_AuthResponse_Possible_CSRF Thu Sep 05 02:45:22 PDT 2013
109.238.231.74 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 06:49:12 PDT 2013
109.64.10.102 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 06:37:25 PDT 2013
109.64.10.102 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 06:37:48 PDT 2013
109.64.10.102 SMTP_Sendmail_XHeader_Overflow Fri Sep 06 06:47:28 PDT 2013
我有兴趣找到序列(关联规则)。数据中有duplicates
,但是我做了以下操作来摆脱它:(这是正确的)
trans = read.transactions(file="Cyber_Security.csv", rm.duplicates= True,
format="single", sep=",", cols =c(1,2))
inspect(trans)
items transactionID
1 {Smurf_Attack} 0.0.0.0
2 {HTTP_AuthResponse_Possible_CSRF} 109.104.76.38
3 {SMTP_Sendmail_XHeader_Overflow} 109.109.59.237
4 {Conficker_P2P_Data_Transfer,
Conficker_P2P_Detected,
Conficker_P2P_Protection} 109.126.228.242
5 {HTTPS_Apache_ClearText_DoS} 109.148.240.237
6 {Smurf_Attack} 109.185.22.245
7 {SMTP_Sendmail_XHeader_Overflow} 109.201.23.98
8 {Conficker_P2P_Data_Transfer,
Conficker_P2P_Detected,
Conficker_P2P_Protection} 109.230.128.210
9 {HTTP_AuthResponse_Possible_CSRF} 109.232.172.122
10 {SMTP_Sendmail_XHeader_Overflow} 109.238.231.74
11 {SMTP_Sendmail_XHeader_Overflow} 109.64.10.102
12 {Tag_Name} Source_IP
我只有两个列 - 我的代码格式为single
和列。
rules = apriori(trans,parameter = list(sup = 0.3, conf = 0.3,target="rules"))
parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.3 0.1 1 none FALSE TRUE 0.3 1 10 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[8 item(s), 12 transaction(s)] done [0.00s].
sorting and recoding items ... [1 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 done [0.00s].
writing ... [1 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> summary(rules)
set of 1 rules
rule length distribution (lhs + rhs):sizes
1
1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 1 1 1
summary of quality measures:
support confidence lift
Min. :0.3333 Min. :0.3333 Min. :1
1st Qu.:0.3333 1st Qu.:0.3333 1st Qu.:1
Median :0.3333 Median :0.3333 Median :1
Mean :0.3333 Mean :0.3333 Mean :1
3rd Qu.:0.3333 3rd Qu.:0.3333 3rd Qu.:1
Max. :0.3333 Max. :0.3333 Max. :1
mining info:
data ntransactions support confidence
trans 12 0.3 0.3
> inspect(rules)
lhs rhs support confidence lift
1 {} => {SMTP_Sendmail_XHeader_Overflow} 0.3333333 0.3333333 1
我是association rules
的新手。结果没有太大帮助。我是以正确的方式做的。如何使用格式'篮子'并尝试构建也意味着完整的规则(通过使用Time
字段)。有没有更好的方法。指导我们,以便我可以学习association
!!