Question

以下是少量网址。我想从该网址获取特定号码。

https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm
http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm
http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm
http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm

我想得到如下输出：

你能帮我解决一下这个问题。

Answer 1

我这样做：

myurl <-c("https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm",
       "http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm", 
       "http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm", 
       "http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm")

# split each string into substrings, with the backslashes as separators
# then take the seventh element of each result
unlist(lapply(myurl, function(u) strsplit(u, "/")[[1]][7]))

"1002638" "1013871" "1420800" "1305014"

Answer 2

使用sep = "/"读取文件，然后获取相关列：

df1 <- read.table(text = "
https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm
http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm
http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm
http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm
                  ", sep = "/")


df1$V7
# [1] 1002638 1013871 1420800 1305014

获取URL的特定部分

2 个答案: