CompanyName Desired Output
Abbey Company.Com abbey company
Manisd Company .com manisd company
Idely.com idely
我如何删除.com,同时注意公司的“com”不受影响。 我试过以下代码
stopwords = c("limited"," l.c.", " llc","corporation"," &"," ltd.","llp ",
"l.l.c","incorporated","association","s.p.a"," l.p.","l.l.l.p","p.a ","p.c ",
"chtd ","chtd. ","r.l.l.l.p ","rlllp ", "the "," lmft", " inc.", ".com")
file_new1$CompanyName<-gsub(paste0(stopwords,collapse = "|"),"", file_new1$CompanyName)
已经参考此链接
答案 0 :(得分:3)
你可以gsub("\\.Com","",dt$CompanyName)
。假设您的data.table
被称为dt
<强>更新强>
另一个解决方案可能是在点(“。”)之前只保留“东西”。
所以
CompanyName <- data.table(V1=c("Abbey Company.Com", "Manisd Company .com", "Idely.com"))
> CompanyName
V1
1: Abbey Company.Com
2: Manisd Company .com
3: Idely.com
CompanyName$V1 <- sel_strsplit(CompanyName$V1,"\\.",1)
> CompanyName
V1
1: Abbey Company
2: Manisd Company
3: Idely
如果您有“.com”,“。com”或“.co.uk”等,那么您无需关心
答案 1 :(得分:3)
如果你有:
CompanyName <- c("Abbey Company.Com", "Manisd Company .com", "Idely.com")
你可以尝试:
gsub(paste0(gsub("\\.","\\\\.",stopwords),collapse = "|"),"",
tolower(CompanyName))
#[1] "abbey company" "manisd company " "idely"