如何选择某些角色?

时间:2018-03-15 07:15:14

标签: r string gsub

您可以下载示例表:https://1drv.ms/x/s!Ag44bY-ZJIWUoUxq3mtI192IYHIt

我希望使用R。

获得一个返回码为200的gif名称列表

所以最终输出应如下所示:

sts-73-patch-small.gif

livevideo.gif

count.gif

NASA-logosmall.gif

KSC-logosmall.gif

launch-logo.gif

我想我需要使用gsub函数但不太确定。

你能告诉我检索上面列表的R代码吗?

1 个答案:

答案 0 :(得分:0)

请在下次添加可重复的样本。以下是基于给定链接数据的示例输入。

输入:

url <- c("/history/apollo/",    "/shuttle/countdown/",  "/shuttle/missions/sts-73/mission-sts-73.html", 
         "/shuttle/countdown/liftoff.html", "/shuttle/missions/sts-73/sts-73-patch-small.gif",  
         "/images/NASA-logosmall.gif",  "/shuttle/countdown/video/livevideo.gif",   
         "/shuttle/countdown/countdown.html",   "/shuttle/countdown/",  "/",    "/shuttle/countdown/count.gif", 
         "/images/NASA-logosmall.gif",  "/images/KSC-logosmall.gif",    "/shuttle/countdown/count.gif", 
         "/images/NASA-logosmall.gif",  "/images/KSC-logosmall.gif",    "/images/ksclogo-medium.gif",   
         "/images/launch-logo.gif", "/facts/about_ksc.html",    "/shuttle/missions/sts-71/images/KSC-95EC-0916.jpg")

return_code <- c(200, 200, 200, 304, 200, 304, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 304, 200, 200, 200)

df <- tibble(url, return_code)
df

# A tibble: 20 x 2
   url                                               return_code
   <chr>                                                   <dbl>
 1 /history/apollo/                                         200.
 2 /shuttle/countdown/                                      200.
 3 /shuttle/missions/sts-73/mission-sts-73.html             200.
 4 /shuttle/countdown/liftoff.html                          304.
 5 /shuttle/missions/sts-73/sts-73-p=atch-small.gif         200.
 6 /images/NASA-logosmall.gif                               304.
 7 /shuttle/countdown/video/livevideo.gif                   200.
 8 /shuttle/countdown/countdown.html                        200.
 9 /shuttle/countdown/                                      200.
10 /                                                        200.
11 /shuttle/countdown/count.gif                             200.
12 /images/NASA-logosmall.gif                               200.
13 /images/KSC-logosmall.gif                                200.
14 /shuttle/countdown/count.gif                             200.
15 /images/NASA-logosmall.gif                               200.
16 /images/KSC-logosmall.gif                                200.
17 /images/ksclogo-medium.gif                               304.
18 /images/launch-logo.gif                                  200.
19 /facts/about_ksc.html                                    200.
20 /shuttle/missions/sts-71/images/KSC-95EC-0916.jpg        200.

方法:

library(tidyverse)

# Custom function to select the last vector from the str_split() output 
find_gif <- function(myDF) {
  map_chr(myDF, ~(.x %>% last()))
}

df2 <- df %>% 
  filter(grepl(".gif$", url), return_code == 200) %>% 
  mutate(url2 = find_gif(str_split(url, "[/]"))) %>% 
  select(url2, return_code)

输出:

df2
# A tibble: 9 x 2
  url2                    return_code
  <chr>                         <dbl>
1 sts-73-patch-small.gif         200.
2 livevideo.gif                  200.
3 count.gif                      200.
4 NASA-logosmall.gif             200.
5 KSC-logosmall.gif              200.
6 count.gif                      200.
7 NASA-logosmall.gif             200.
8 KSC-logosmall.gif              200.
9 launch-logo.gif                200.

如果您只想要唯一值,请使用unique()函数。

df2 %>% select(url2) %>% unique()
# A tibble: 6 x 1
  url2                  
  <chr>                 
1 sts-73-patch-small.gif
2 livevideo.gif         
3 count.gif             
4 NASA-logosmall.gif    
5 KSC-logosmall.gif     
6 launch-logo.gif