R中的关联规则

时间:2014-01-27 05:39:11

标签: r arules

在下面找到我正在处理的数据:

Source_IP   Tag_Name    Time
0.0.0.0 Smurf_Attack    Fri Sep 06 00:21:22 PDT 2013
0.0.0.0 Smurf_Attack    Fri Sep 06 00:21:23 PDT 2013
0.0.0.0 Smurf_Attack    Fri Sep 06 00:23:00 PDT 2013
0.0.0.0 Smurf_Attack    Fri Sep 06 00:23:00 PDT 2013
0.0.0.0 Smurf_Attack    Fri Sep 06 00:44:49 PDT 2013
109.104.76.38   HTTP_AuthResponse_Possible_CSRF Tue Sep 03 00:50:31 PDT 2013
109.104.76.38   HTTP_AuthResponse_Possible_CSRF Tue Sep 03 06:38:58 PDT 2013
109.109.59.237  SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 06:38:33 PDT 2013
109.126.228.242 Conficker_P2P_Detected  Fri Sep 06 14:12:02 PDT 2013
109.126.228.242 Conficker_P2P_Data_Transfer Fri Sep 06 14:12:02 PDT 2013
109.126.228.242 Conficker_P2P_Protection    Fri Sep 06 14:12:02 PDT 2013
109.148.240.237 HTTPS_Apache_ClearText_DoS  Fri Sep 06 03:33:48 PDT 2013
109.185.22.245  Smurf_Attack    Fri Sep 06 11:49:21 PDT 2013
109.201.23.98   SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 08:57:26 PDT 2013
109.201.23.98   SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 08:57:29 PDT 2013
109.230.128.210 Conficker_P2P_Detected  Wed Sep 04 04:24:51 PDT 2013
109.230.128.210 Conficker_P2P_Data_Transfer Wed Sep 04 04:24:51 PDT 2013
109.230.128.210 Conficker_P2P_Protection    Wed Sep 04 04:24:51 PDT 2013
109.232.172.122 HTTP_AuthResponse_Possible_CSRF Thu Sep 05 02:45:22 PDT 2013
109.238.231.74  SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 06:49:12 PDT 2013
109.64.10.102   SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 06:37:25 PDT 2013
109.64.10.102   SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 06:37:48 PDT 2013
109.64.10.102   SMTP_Sendmail_XHeader_Overflow  Fri Sep 06 06:47:28 PDT 2013

我有兴趣找到序列(关联规则)。数据中有duplicates,但是我做了以下操作来摆脱它:(这是正确的)

trans = read.transactions(file="Cyber_Security.csv", rm.duplicates= True, 
                          format="single", sep=",", cols =c(1,2))

inspect(trans)

items                               transactionID
1  {Smurf_Attack}                    0.0.0.0        
2  {HTTP_AuthResponse_Possible_CSRF} 109.104.76.38  
3  {SMTP_Sendmail_XHeader_Overflow}  109.109.59.237 
4  {Conficker_P2P_Data_Transfer,                    
    Conficker_P2P_Detected,                         
    Conficker_P2P_Protection}        109.126.228.242
5  {HTTPS_Apache_ClearText_DoS}      109.148.240.237
6  {Smurf_Attack}                    109.185.22.245 
7  {SMTP_Sendmail_XHeader_Overflow}  109.201.23.98  
8  {Conficker_P2P_Data_Transfer,                    
    Conficker_P2P_Detected,                         
    Conficker_P2P_Protection}        109.230.128.210
9  {HTTP_AuthResponse_Possible_CSRF} 109.232.172.122
10 {SMTP_Sendmail_XHeader_Overflow}  109.238.231.74 
11 {SMTP_Sendmail_XHeader_Overflow}  109.64.10.102  
12 {Tag_Name}                        Source_IP  

我只有两个列 - 我的代码格式为single和列。

rules = apriori(trans,parameter = list(sup = 0.3, conf = 0.3,target="rules"))

parameter specification:
 confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
        0.3    0.1    1 none FALSE            TRUE     0.3      1     10  rules FALSE

algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[8 item(s), 12 transaction(s)] done [0.00s].
sorting and recoding items ... [1 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 done [0.00s].
writing ... [1 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> summary(rules)
set of 1 rules

rule length distribution (lhs + rhs):sizes
1 
1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

summary of quality measures:
    support         confidence          lift  
 Min.   :0.3333   Min.   :0.3333   Min.   :1  
 1st Qu.:0.3333   1st Qu.:0.3333   1st Qu.:1  
 Median :0.3333   Median :0.3333   Median :1  
 Mean   :0.3333   Mean   :0.3333   Mean   :1  
 3rd Qu.:0.3333   3rd Qu.:0.3333   3rd Qu.:1  
 Max.   :0.3333   Max.   :0.3333   Max.   :1  

mining info:
  data ntransactions support confidence
 trans            12     0.3        0.3
> inspect(rules)
  lhs    rhs                                support confidence lift
1 {}  => {SMTP_Sendmail_XHeader_Overflow} 0.3333333  0.3333333    1

我是association rules的新手。结果没有太大帮助。我是以正确的方式做的。如何使用格式'篮子'并尝试构建也意味着完整的规则(通过使用Time字段)。有没有更好的方法。指导我们,以便我可以学习association !!

0 个答案:

没有答案