如果满足条件,则删除数据框中的数据行

时间:2018-01-19 07:12:35

标签: r regex dataframe

我有一个网址数据框,但我试图删除包含任何图片网址的行。我尝试了这个,但它没有工作

url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
image_pattern <- "http[s]?://.*\\.(?:png|jpg)"

links <- str_extract(data[[2]], url_pattern)
images <- str_extract(data[[2]], image_pattern)

links <- links[!is.na(links)]
links <- data.frame(url = links)
links <- links[!url %in% images]

1 个答案:

答案 0 :(得分:1)

如果我理解你正在尝试做什么,那么你似乎过于复杂了。我们假设您有以下data.frame

df= data.frame(url = c('https://i.stack.imgur.com/rkCC0.png?s=48&g=1',
                       'https://www.google.com',
                       'https://www.this.is.an.image.jpg'),
                       id = c(1,2,3))

We can remove all rows that have an image URL in column `url` as follows:

image_pattern <- "http[s]?://.*\\.(?:png|jpg)"
df[!grepl(image_pattern,df$url),]

结果:

                     url id
2 https://www.google.com  2

希望这有帮助!