我有一个主数据框,其中包含许多正在使用的网站,另一个数据框包含一个不良网站列表,以匹配并标识我的主数据框中是否有不良网站。由于我对此非常陌生,因此我不确定如何将不良网站替换为“ www.badwebsite.com”?谢谢。
以下是数据帧的示例:
site_list <- data.frame("host" = c("www.companya.com", "www.companyb.com", "www.malwaresite.com",
"www.companyc.com", "www.companyd.com", "www.virussite.com",
"www.companye.com", "www.companyf.com", "www.phishingsite.com"),
"URL" = c("www.companya.com/home", "www.companyb.com/home", "www.malwaresite.com/home",
"www.companyc.com/home", "www.companyd.com/home", "www.virussite.com/home",
"www.companye.com/home", "www.companyf.com/home", "www.phishingsite.com/home"))
bad_site_list <- data.frame("host" = c("www.malwaresite.com", "www.virussite.com", "www.phishingsite.com"))
我希望达到这个结果:
host URL
www.companya.com www.companya.com/home
www.companyb.com www.companyb.com/home
www.badwebsite.com www.badwebsite.com/home
www.companyc.com www.companyc.com/home
www.companyd.com www.companyd.com/home
www.badwebsite.com www.badwebsite.com/home
www.companye.com www.companye.com/home
www.companyf.com www.companyf.com/home
www.badwebsite.com www.badwebsite.com/home
答案 0 :(得分:1)
对于您的简单示例,我将通过以下方式进行操作,对于更复杂的表可能不是最佳选择:
apply(site_list, 2, function(x)gsub(paste(bad_site_list$host, collapse="|"), "www.badwebsite.com", x))
在apply中:“ 2”表示您将在每列上应用一个函数(“ 1”在每行中应用)。
该功能在bad_site_list中查找所有主机,并将其替换为www.badwebsite.com(使用gsub)
答案 1 :(得分:1)
没有正则表达式,您可以这样:
# Converting factor columsn to character
site_list[] <- lapply(site_list, as.character)
bad_site_list[] <- lapply(bad_site_list, as.character)
# If you want to replace all the bad sites with "www.badwebsite.com" you can:
site_list$URL[site_list$host %in% bad_site_list$host] <- "www.badwebsite.com/home"
site_list$host[site_list$host %in% bad_site_list$host] <- "www.badwebsite.com"
site_list
host URL
1 www.companya.com www.companya.com/home
2 www.companyb.com www.companyb.com/home
3 www.badwebsite.com www.badwebsite.com/home
4 www.companyc.com www.companyc.com/home
5 www.companyd.com www.companyd.com/home
6 www.badwebsite.com www.badwebsite.com/home
7 www.companye.com www.companye.com/home
8 www.companyf.com www.companyf.com/home
9 www.badwebsite.com www.badwebsite.com/home
使用正则表达式,您可以这样:
# Using regex you could create a pattern
bad_site_pattern <- paste(bad_site_list$host, collapse = "|")
# Then replace all instances in the dataframe using lapply
site_list[] <- lapply(site_list, gsub, pattern = bad_site_pattern, replacement = "www.badwebsite.com")
site_list
host URL
1 www.companya.com www.companya.com/home
2 www.companyb.com www.companyb.com/home
3 www.badwebsite.com www.badwebsite.com/home
4 www.companyc.com www.companyc.com/home
5 www.companyd.com www.companyd.com/home
6 www.badwebsite.com www.badwebsite.com/home
7 www.companye.com www.companye.com/home
8 www.companyf.com www.companyf.com/home
9 www.badwebsite.com www.badwebsite.com/home
答案 2 :(得分:0)
加载库(字符串)
str_detect(dataframe_name,“ string_your_searching_for”)
str_replace(数据帧名称,“旧字符串”,“新字符串”)