您可以下载示例表:https://1drv.ms/x/s!Ag44bY-ZJIWUoUxq3mtI192IYHIt
我希望使用R。
获得一个返回码为200的gif名称列表所以最终输出应如下所示:
sts-73-patch-small.gif
livevideo.gif
count.gif
NASA-logosmall.gif
KSC-logosmall.gif
launch-logo.gif
我想我需要使用gsub函数但不太确定。
你能告诉我检索上面列表的R代码吗?
答案 0 :(得分:0)
请在下次添加可重复的样本。以下是基于给定链接数据的示例输入。
输入:
url <- c("/history/apollo/", "/shuttle/countdown/", "/shuttle/missions/sts-73/mission-sts-73.html",
"/shuttle/countdown/liftoff.html", "/shuttle/missions/sts-73/sts-73-patch-small.gif",
"/images/NASA-logosmall.gif", "/shuttle/countdown/video/livevideo.gif",
"/shuttle/countdown/countdown.html", "/shuttle/countdown/", "/", "/shuttle/countdown/count.gif",
"/images/NASA-logosmall.gif", "/images/KSC-logosmall.gif", "/shuttle/countdown/count.gif",
"/images/NASA-logosmall.gif", "/images/KSC-logosmall.gif", "/images/ksclogo-medium.gif",
"/images/launch-logo.gif", "/facts/about_ksc.html", "/shuttle/missions/sts-71/images/KSC-95EC-0916.jpg")
return_code <- c(200, 200, 200, 304, 200, 304, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 304, 200, 200, 200)
df <- tibble(url, return_code)
df
# A tibble: 20 x 2
url return_code
<chr> <dbl>
1 /history/apollo/ 200.
2 /shuttle/countdown/ 200.
3 /shuttle/missions/sts-73/mission-sts-73.html 200.
4 /shuttle/countdown/liftoff.html 304.
5 /shuttle/missions/sts-73/sts-73-p=atch-small.gif 200.
6 /images/NASA-logosmall.gif 304.
7 /shuttle/countdown/video/livevideo.gif 200.
8 /shuttle/countdown/countdown.html 200.
9 /shuttle/countdown/ 200.
10 / 200.
11 /shuttle/countdown/count.gif 200.
12 /images/NASA-logosmall.gif 200.
13 /images/KSC-logosmall.gif 200.
14 /shuttle/countdown/count.gif 200.
15 /images/NASA-logosmall.gif 200.
16 /images/KSC-logosmall.gif 200.
17 /images/ksclogo-medium.gif 304.
18 /images/launch-logo.gif 200.
19 /facts/about_ksc.html 200.
20 /shuttle/missions/sts-71/images/KSC-95EC-0916.jpg 200.
方法:
library(tidyverse)
# Custom function to select the last vector from the str_split() output
find_gif <- function(myDF) {
map_chr(myDF, ~(.x %>% last()))
}
df2 <- df %>%
filter(grepl(".gif$", url), return_code == 200) %>%
mutate(url2 = find_gif(str_split(url, "[/]"))) %>%
select(url2, return_code)
输出:
df2
# A tibble: 9 x 2
url2 return_code
<chr> <dbl>
1 sts-73-patch-small.gif 200.
2 livevideo.gif 200.
3 count.gif 200.
4 NASA-logosmall.gif 200.
5 KSC-logosmall.gif 200.
6 count.gif 200.
7 NASA-logosmall.gif 200.
8 KSC-logosmall.gif 200.
9 launch-logo.gif 200.
如果您只想要唯一值,请使用unique()
函数。
df2 %>% select(url2) %>% unique()
# A tibble: 6 x 1
url2
<chr>
1 sts-73-patch-small.gif
2 livevideo.gif
3 count.gif
4 NASA-logosmall.gif
5 KSC-logosmall.gif
6 launch-logo.gif