在R中创建一个60小时和1分钟时间间隔的24小时向量

时间:2018-05-06 15:30:47

标签: r

我有一个防火墙日志文件,其中包括date,hour,src_address,dest_address和Date.time。我想创建一个24小时的向量,每次60分钟和1分钟的时间间隔(例如,从2018/01/01到2018/05/06)。然后在这些间隔中我想找到一对src_address和dest_address的外观。最后,对于每对src_address和dest_address.Here,这些外观的最大值是我的文件;

                date     hour    src_address  dest_address           Date.Time

1996  2018-04-14 08:24:01    1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702  2018-04-15 12:10:27    1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154  2018-04-14 00:59:27    1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414  2018-04-14 12:33:29    1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
6903  2018-04-25 21:31:52   1.179.191.82   172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43   1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10   1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39   1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01   1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58   1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34   1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752  2018-04-14 19:34:53   1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046  2018-04-15 18:16:40    1.203.84.52   172.16.16.5 2018-04-15 18:16:40
4048  2018-04-15 18:18:43    1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020  2018-04-15 01:35:40    1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870  2018-04-16 05:33:42   1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025  2018-04-25 22:28:06   1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262  2018-04-15 23:31:56   1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369  2018-04-26 10:32:50   1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716  2018-04-14 18:49:30   1.214.34.114   172.16.16.5 2018-04-14 18:49:30
9563  2018-04-26 12:34:58   1.214.34.114   172.16.16.5 2018-04-26 12:34:58
1110  2018-04-14 00:27:02   1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470  2018-04-16 01:27:32   1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581  2018-04-26 12:55:39    1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970  2018-04-15 00:01:18    1.55.249.92   172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16    1.55.249.92   172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30    1.55.249.92   172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04   1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04   1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04   1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04   1.71.188.254   172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04   1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04   1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315  2018-04-25 18:56:08     1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00     1.85.18.88 172.16.16.153 2018-04-27 16:41:00

这是我的期望;

   src_address  dest_address max(per hour) max(per minute)
2  1.11.201.19 172.16.16.100           1       1   
3  1.119.43.90 172.16.16.100           1       1   
4  1.119.43.90 172.16.16.153           1       1   
5  1.119.43.90 192.168.1.112           1       1   
6 1.171.43.133   172.16.16.5           2       2   

1 个答案:

答案 0 :(得分:1)

要获取摘要数据,必须完成一些事情。可以使用dplyrtidyrlubridate包来转换数据。

  

方法:

     
      
  1. 通过合并日期和小时并转换为创建DateTime列   ymd_hms
  2.   
  3. 分组在src_addresdest_addressYear-Month-Day Hour上   计算每小时发生次数
  4.   
  5. src_addresdest_addressYear-Month-Day Hour:Min上进行分组以计算>每分钟发生一次
  6.   
  7. src_addresdest_address进行分组并汇总以获得每小时和每分钟最多的次数
  8.   
library(dplyr)
library(tidyr)
library(lubridate)

df %>% unite("DateTime", c("date","hour"), sep=" ") %>% 
  mutate(DateTime = ymd_hms(DateTime)) %>%
  group_by(src_addres, dest_address, YMD_H = format(DateTime, "%Y-%m-%d %H")) %>%
  mutate(HourlyAppearance = n()) %>%
  group_by(src_addres, dest_address, YMD_HM = format(DateTime, "%Y-%m-%d %H:%M")) %>%
  mutate(PerMinAppearance = n()) %>%
  group_by(src_addres, dest_address) %>%
  summarise( 'max(per hour)' = max(HourlyAppearance), 
           'max(per min)' = max(PerMinAppearance)) %>%
  as.data.frame()

#      src_addres  dest_address max(per hour) max(per min)
# 1   1.11.201.19 172.16.16.100             1            1
# 2   1.119.43.90 172.16.16.100             1            1
# 3   1.119.43.90 172.16.16.153             1            1
# 4   1.119.43.90 192.168.1.112             1            1
# 5  1.171.43.133   172.16.16.5             2            2
# 6  1.179.191.82   172.16.16.5             1            1
# 7  1.179.191.82 192.168.1.111             1            1
# 8  1.179.191.82 192.168.1.112             1            1
# 9  1.180.72.186 172.16.16.153             1            1
# 10 1.202.165.40 172.16.16.153             1            1
# 11  1.203.84.52   172.16.16.5             1            1
# 12  1.203.84.52 192.168.1.112             1            1
# 13  1.209.171.4 192.168.1.111             1            1
# 14 1.214.34.114 172.16.16.100             1            1
# 15 1.214.34.114 172.16.16.153             1            1
# 16 1.214.34.114   172.16.16.5             1            1
# 17 1.214.34.114 192.168.1.111             1            1
# 18 1.214.34.114 192.168.1.112             1            1
# 19  1.55.249.92 172.16.16.153             1            1
# 20  1.55.249.92   172.16.16.5             1            1
# 21 1.71.188.254 172.16.16.100             1            1
# 22 1.71.188.254 172.16.16.153             1            1
# 23 1.71.188.254 172.16.16.159             1            1
# 24 1.71.188.254   172.16.16.5             1            1
# 25 1.71.188.254 192.168.1.111             1            1
# 26 1.71.188.254 192.168.1.112             1            1
# 27   1.85.18.88 172.16.16.153             1            1

数据:

OP没有以非常简单的格式提供数据。包含日期和时间列使其变得更加困难。也许这是对这个问题反应迟钝的原因。我更倾向于分别阅读datetime部分,然后unite分别阅读Date/Time

strtext <- "Sl  date hour  src_addres  dest_address  Date_t   Time_t
1996  2018-04-14 08:24:01    1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702  2018-04-15 12:10:27    1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154  2018-04-14 00:59:27    1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414  2018-04-14 12:33:29    1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
6903  2018-04-25 21:31:52   1.179.191.82   172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43   1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10   1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39   1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01   1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58   1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34   1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752  2018-04-14 19:34:53   1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046  2018-04-15 18:16:40    1.203.84.52   172.16.16.5 2018-04-15 18:16:40
4048  2018-04-15 18:18:43    1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020  2018-04-15 01:35:40    1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870  2018-04-16 05:33:42   1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025  2018-04-25 22:28:06   1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262  2018-04-15 23:31:56   1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369  2018-04-26 10:32:50   1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716  2018-04-14 18:49:30   1.214.34.114   172.16.16.5 2018-04-14 18:49:30
9563  2018-04-26 12:34:58   1.214.34.114   172.16.16.5 2018-04-26 12:34:58
1110  2018-04-14 00:27:02   1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470  2018-04-16 01:27:32   1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581  2018-04-26 12:55:39    1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970  2018-04-15 00:01:18    1.55.249.92   172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16    1.55.249.92   172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30    1.55.249.92   172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04   1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04   1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04   1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04   1.71.188.254   172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04   1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04   1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315  2018-04-25 18:56:08     1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00     1.85.18.88 172.16.16.153 2018-04-27 16:41:00"

df <- read.table(text = strtext,header = TRUE, stringsAsFactors = FALSE)