Question

我有这个大数据表有更多细节，但我只是展示了一些例子

 data

                 Source   | Protocol
                  10.0.0.6  SSDP    
                  10.0.0.6  TCP
                  10.0.0.6  HTTP
                  10.0.0.6  BROWSER
                  10.0.0.6  LLMNR
                  10.0.0.6  NBNS
                  10.0.0.10 MDNS
                  10.0.0.10 ICMPv6 
                  10.0.0.10 IGMPv3
                  10.0.0.10 HTTP/XML

所以我创建了一个名为port的表。

Protocol
SSDP
ARP
TCP
HTTP
BROWSER
LLMNR
NBNS
DHCPv6
MDNS
ICMPv6
IGMPv3
HTTP/XML

有没有办法让表格端口看起来像这样，而不需要硬编码。是通过使用循环？ PS，:(原谅我，我只是在学习R。

端口表将自己创建一个新列，确定它在数据中包含多少个源IP，如果ip正在使用该端口，则实现0/1。

Protocol 10.0.0.6 10.0.0.10
SSDP        1         0
ARP         0         0
TCP         1         0
HTTP        1         0
BROWSER     1         0
LLMNR       1         0
NBNS        1         0
DHCPv6      0         0
MDNS        0         1
ICMPv6      0         1
IGMPv3      0         1
HTTP/XML    0         1

Answer 1

在最简单的层面上，听起来你想要table：

with(mydf, table(Protocol, Source))

由于table制表（因此，如果多个值匹配该组合，可能会给出大于1的值），您可能需要进行一些进一步的处理。此外，由于您的原始数据集似乎没有factor级别，因此您还需要使用factor：

port <- c("SSDP", "ARP", "TCP", "HTTP", "BROWSER", "LLMNR", 
          "NBNS", "DHCPv6", "MDNS", "ICMPv6", "IGMPv3", "HTTP/XML")

(with(mydf, table(factor(Protocol, port), Source)) > 0) * 1
#           Source
#            10.0.0.10 10.0.0.6
#   SSDP             0        1
#   ARP              0        0
#   TCP              0        1
#   HTTP             0        1
#   BROWSER          0        1
#   LLMNR            0        1
#   NBNS             0        1
#   DHCPv6           0        0
#   MDNS             1        0
#   ICMPv6           1        0
#   IGMPv3           1        0
#   HTTP/XML         1        0

更新

要证明的其他一些例子：

为什么我使用> 0) * 1
删除某些值时droplevels会有什么影响

## Imagine this is the source data.frame
## We don't want "A" values from "Source"
## We do want all relevant levels in "Protocol"
##   which, for this example, we can assume to
##   be 1, 2, and 3
mydf <- data.frame(Source = c("A", "B", "C", "B", "C", "C"),
                   Protocol = c(1, 1, 2, 2, 1, 1))

## Now, compare the following

with(mydf, table(factor(Protocol, 1:3), Source))

with(mydf[!mydf$Source %in% "A", ], 
     table(factor(Protocol, 1:3), Source))

with(droplevels(mydf[!mydf$Source %in% "A", ]), 
     table(factor(Protocol, 1:3), Source))

使用R排序和分组表

1 个答案:

更新