在R中提取一部分URL

时间:2018-12-07 18:19:10

标签: r data-manipulation

dput(mydf)
structure(list(urls = c("/players/a/abdulma02.html", 
"/players/a/abdulta01.html", 
"/players/a/abdursh01.html", "/players/a/alexaco01.html", "/players/a/alexaco02.html"
), names = c("Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim", 
"Cory Alexander", "Courtney Alexander")), row.names = c(NA, 5L
), class = "data.frame")

head(mydf)
                       urls               names
1 /players/a/abdulma02.html  Mahmoud Abdul-Rauf
2 /players/a/abdulta01.html   Tariq Abdul-Wahad
3 /players/a/abdursh01.html Shareef Abdur-Rahim
4 /players/a/alexaco01.html      Cory Alexander
5 /players/a/alexaco02.html  Courtney Alexander

我的问题很简单-我想在html之前提取部分URL(abdulma02,abdulta01等)。数据经过格式化,以使结尾始终为.html,开始始终为/players/{single letter}/{what i want}.html

我已经尝试使用新的urltools库来解决这个问题(尝试使用他们的urltools::suffix_extract()函数)。感谢您的任何帮助。

1 个答案:

答案 0 :(得分:1)

我们可以使用

tools::file_path_sans_ext(basename(mydf$urls))
#[1] "abdulma02" "abdulta01" "abdursh01" "alexaco01" "alexaco02"