test.data <- data.frame(summary = c("Execute commands as root via buffer overflow in Tooltalk database server (rpc.ttdbserverd)."
,"Information from SSL-encrypted sessions via PKCS #1."
,"ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets."),
wascname=c(NA, NA, "Improper Input Handling"),stringsAsFactors = FALSE)
wascNames <- data.frame(wascname=c("Abuse of Functionality","Brute Force","Buffer Overflow","Content Spoofing"
,"Credential/Session Prediction","Cross-Site Scripting","Cross-Site Request Forgery","Denial of Service"
,"Fingerprinting","Format String","HTTP Response Smuggling","HTTP Response Splitting"
,"HTTP Request Smuggling","HTTP Request Splitting","Integer Overflows","LDAP Injection"
,"Mail Command Injection","Null Byte Injection","OS Commanding","Path Traversal"
,"Predictable Resource Location","Remote File Inclusion (RFI)","Routing Detour","Session Fixation"
,"SOAP Array Abuse","SSI Injection","SQL Injection","URL Redirector Abuse"
,"XPath Injection","XML Attribute Blowup","XML External Entities","XML Entity Expansion"
,"XML Injection","XQuery Injection","Cross-site Scripting","Directory Indexing"
,"Improper Filesystem Permissions","Improper Input Handling","Improper Output Handling","Information Leakage"
,"Insecure Indexing","Insufficient Anti-Automation","Insufficient Authentication","Insufficient Authorization"
,"Insufficient Password Recovery","Insufficient Process Validation","Insufficient Session Expiration","Insufficient Transport Layer Protection"
,"Remote File Inclusion","URl Redirector Abuse"),stringsAsFactors = FALSE)
以下是我一直在尝试修复的代码。如果test.data$summary
包含wascNames$wascname
中的字符串,请仅在test.data$wascname
时替换is.na
:
test.data$wascname<-sapply(test.data$summary, function(x)
ifelse(identical(wascNames$wascname[str_detect(x,regex(wascNames$wascname, ignore_case = T))&
is.na(test.data$wascname)==TRUE], character(0)),test.data$wascname,
wascNames$wascname[str_detect(x,regex(wascNames$wascname, ignore_case = T))==TRUE]))
我想要以下输出:
提前谢谢你。考虑使用for循环,但对于200000 obs来说太慢了。
答案 0 :(得分:1)
我相信这应该有效:
test.data$wascname2 <- sapply(1:nrow(test.data), function(x) ifelse(is.na(test.data$wascname[x]),
wascNames$wascname[str_detect(test.data$summary[x], regex(wascNames$wascname, ignore_case = TRUE))],
test.data$wascname[x]))
test.data$wascname2
#[1] "Buffer Overflow" NA "Improper Input Handling"
它仍然以sapply
循环,但我认为根据您的数据结构,这是不可避免的(即对于每个字符串,您希望在wascNames$wascname
表中查找它)。