我有一个网址数据框,但我试图删除包含任何图片网址的行。我尝试了这个,但它没有工作
url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
image_pattern <- "http[s]?://.*\\.(?:png|jpg)"
links <- str_extract(data[[2]], url_pattern)
images <- str_extract(data[[2]], image_pattern)
links <- links[!is.na(links)]
links <- data.frame(url = links)
links <- links[!url %in% images]
答案 0 :(得分:1)
如果我理解你正在尝试做什么,那么你似乎过于复杂了。我们假设您有以下data.frame
:
df= data.frame(url = c('https://i.stack.imgur.com/rkCC0.png?s=48&g=1',
'https://www.google.com',
'https://www.this.is.an.image.jpg'),
id = c(1,2,3))
We can remove all rows that have an image URL in column `url` as follows:
image_pattern <- "http[s]?://.*\\.(?:png|jpg)"
df[!grepl(image_pattern,df$url),]
结果:
url id
2 https://www.google.com 2
希望这有帮助!