使用以下方法将XMLDocument类型对象转换为Character:
do.call(paste, as.list(capture.output(list_links)))
我想使用strsplit从生成的字符对象中提取特定字符串。 list_links的输出如下。
[1] "[[1]] <a href=\"/Archive/CrossNational.asp\">Cross-National Data</a> [[2]] <a href=\"/Archive/MultiNation.asp\">Multiple Nation Surveys</a> [[3]] <a href=\"/Archive/IntSurveys.asp\">Single Nation Surveys</a> [[4]] <a href=\"/Archive/ChCounty.asp\">County-Level Data</a> [[5]] <a href=\"/Archive/ChState.asp\">State-Level Data</a> [[6]] <a href=\"/Archive/NatBaylor.asp\">Baylor Religion Surveys</a> [[7]] <a href=\"/Archive/GSS.asp\">General Social Surveys</a> [[8]] <a href=\"/Archive/Polls.asp\">News Polls</a> [[9]] <a href=\"/Archive/NES.asp\">National Election Studies</a> [[10]] <a href=\"/Archive/NatFamily.asp\">National Survey of Family Growth</a> [[11]] <a href=\"/Archive/NSYR.asp\">National Studies of Youth and Religion (NSYR)</a> [[12]] <a href=\"/Archive/PewResearch.asp\">Pew Research Center</a> [[13]] <a href=\"/Archive/PALS.asp\">Portraits of American Life Study (PALS)</a> [[14]] <a href=\"/Archive/PRRI.asp\">Public Religion Research Institute (PRRI)</a> [[15]] <a href=\"/Archive/NatOther.asp\">Other National Surveys</a> [[16]] <a href=\"/Archive/State1stAmnd.asp\">State of the First Amendment Surveys</a> [[17]] <a href=\"/Archive/Middletown.asp\">Middletown Data</a> [[18]] <a href=\"/Archive/Sfocus.asp\">Southern Focus Polls</a> [[19]] <a href=\"/Archive/RegOther.asp\">Other Local/Regional Surveys</a> [[20]] <a href=\"/Archive/FCT.asp\">Faith Communities Today</a> [[21]] <a href=\"/Archive/NCS.asp\">National Congregations Study</a> [[22]] <a href=\"/Archive/USCLS.asp\">U.S. Congregational Life Survey</a> [[23]] <a href=\"/Archive/CongOther.asp\">Other Surveys</a> [[24]] <a href=\"/Archive/Adventist.asp\">Adventist</a> [[25]] <a href=\"/Archive/Baptist.asp\">Baptist</a> [[26]] <a href=\"/Archive/Catholic.asp\">Catholic</a> [[27]] <a href=\"/Archive/Jewish.asp\">Jewish</a> [[28]] <a href=\"/Archive/Lutheran.asp\">Lutheran</a> [[29]] <a href=\"/Archive/Methodist.asp\">Methodist</a> [[30]] <a href=\"/Archive/Mormon.asp\">Mormon</a> [[31]] <a href=\"/Archive/Nazarene.asp\">Nazarene</a> [[32]] <a href=\"/Archive/Presbyterian.asp\">Presbyterian</a> [[33]] <a href=\"/Archive/Unitarian.asp\">Unitarian-Universalist</a> [[34]] <a href=\"/Archive/GrpOther.asp\">Other Groups</a> [[35]] <a href=\"/Archive/InstructData.asp\">Instructional Data Files</a> [[36]] <a href=\"/Archive/Other.asp\">Other Data</a> "
我想提取a标签中每个网址的列表。即使用strsplit后我列表中的第一个对象应为“/Archive/CrossNational.asp”
答案 0 :(得分:0)
这将使用txt
对strsplit
- 对象执行此操作,但这并不是每个人都可能选择的功能。在拆分href-preamble和关闭标记后,此代码收集偶数项。 &#34;分裂&#34;参数是包含两部分的OR-ed组合。有关R正则表达式的更多详细信息,请参阅?regex
:
strsplit(txt, "\\]\\] <a href\\=\\\"|\\\">")[[1]][c(FALSE,TRUE)]
#--- result ----
[1] "/Archive/CrossNational.asp" "/Archive/MultiNation.asp"
[3] "/Archive/IntSurveys.asp" "/Archive/ChCounty.asp"
[5] "/Archive/ChState.asp" "/Archive/NatBaylor.asp"
[7] "/Archive/GSS.asp" "/Archive/Polls.asp"
[9] "/Archive/NES.asp" "/Archive/NatFamily.asp"
[11] "/Archive/NSYR.asp" "/Archive/PewResearch.asp"
[13] "/Archive/PALS.asp" "/Archive/PRRI.asp"
[15] "/Archive/NatOther.asp" "/Archive/State1stAmnd.asp"
[17] "/Archive/Middletown.asp" "/Archive/Sfocus.asp"
[19] "/Archive/RegOther.asp" "/Archive/FCT.asp"
[21] "/Archive/NCS.asp" "/Archive/USCLS.asp"
[23] "/Archive/CongOther.asp" "/Archive/Adventist.asp"
[25] "/Archive/Baptist.asp" "/Archive/Catholic.asp"
[27] "/Archive/Jewish.asp" "/Archive/Lutheran.asp"
[29] "/Archive/Methodist.asp" "/Archive/Mormon.asp"
[31] "/Archive/Nazarene.asp" "/Archive/Presbyterian.asp"
[33] "/Archive/Unitarian.asp" "/Archive/GrpOther.asp"
[35] "/Archive/InstructData.asp" "/Archive/Other.asp"