以下是少量网址。我想从该网址获取特定号码。
https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm
http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm
http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm
http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm
我想得到如下输出:
1002638
1013871
1420800
1305014
你能帮我解决一下这个问题。
答案 0 :(得分:1)
我这样做:
myurl <-c("https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm",
"http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm",
"http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm",
"http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm")
# split each string into substrings, with the backslashes as separators
# then take the seventh element of each result
unlist(lapply(myurl, function(u) strsplit(u, "/")[[1]][7]))
"1002638" "1013871" "1420800" "1305014"
答案 1 :(得分:0)
使用sep = "/"
读取文件,然后获取相关列:
df1 <- read.table(text = "
https://www.sec.gov/Archives/edgar/data/1002638/000100263816000080/exhibit211subsidiarylisting.htm
http://www.sec.gov/Archives/edgar/data/1013871/000101387113000003/exhibit21110k2012.htm
http://www.sec.gov/Archives/edgar/data/1420800/000142080014000006/exhibit211subsidiariesofth.htm
http://www.sec.gov/Archives/edgar/data/1305014/000130501415000119/a9302015exhibit21.htm
", sep = "/")
df1$V7
# [1] 1002638 1013871 1420800 1305014