我有一个这样的数据框(DF):
word
1 vet clinic New York
2 super haircut Alabama
3 best deal on dog drugs
4 doggy medicine Texas
5 cat healthcare
6 lizards that don't lie
我正在尝试获取结果数据框(仅删除地理名称)
word
1 vet clinic
2 super haircut
3 best deal on dog drugs
4 doggy medicine
5 cat healthcare
6 lizards that don't lie
以下内容未保留地理名称后的剩余字词。
vec <- # vector of geo names
DF <-DF[!grepl(vec,DF$word),]
答案 0 :(得分:2)
使用@Ari的变量和数据框,矢量化方法可以使用Reduce:
vec = c("New York", "Texas", "Alabama")
word = c("vet clinic New York", "super haircut Alabama", "best deal on dog drugs", "doggy medicine Texas", "cat healthcare", "lizards that don't lie")
df = data.frame(word=word)
df$word = as.character(df$word)
Reduce(function(a, b) gsub(b,"", a, fixed=T), vec, df$word)
[1] "vet clinic " "super haircut " "best deal on dog drugs" "doggy medicine "
[5] "cat healthcare" "lizards that don't lie"
答案 1 :(得分:1)
正如Henrik所说,如果您在帖子中提交了reproducible example,那将会很有帮助。我会在这里这样做:
vec = c("New York", "Texas", "Alabama")
word = c("vet clinic New York", "super haircut Alabama", "best deal on dog drugs", "doggy medicine Texas", "cat healthcare", "lizards that don't lie")
df = data.frame(word=word)
df$word = as.character(df$word)
df
word
1 vet clinic New York
2 super haircut Alabama
3 best deal on dog drugs
4 doggy medicine Texas
5 cat healthcare
6 lizards that don't lie
一般来说,R gurus更喜欢矢量化而不是for循环。但在这种情况下,我发现嵌套的for循环和stringr包是解决此问题的最简单方法。
library(stringr)
for(i in 1:nrow(df))
{
for (j in 1:length(vec))
{
df[i, "word"] = str_replace_all(df[i, "word"], vec[j], "")
}
}
df
word
1 vet clinic
2 super haircut
3 best deal on dog drugs
4 doggy medicine
5 cat healthcare
6 lizards that don't lie
我相信这段代码可以为您提供所需的结果。
答案 2 :(得分:1)
使用@Ari的例子,
library(stringr)
df$word <- str_trim(gsub(paste(vec,collapse="|"),"", df$word))
df$word
#[1] "vet clinic" "super haircut" "best deal on dog drugs"
#[4] "doggy medicine" "cat healthcare" "lizards that don't lie"