Question

我想提取URL列表的源代码的一个特定数据。让我们以one URL为例。
在源代码中，我想提取pfDataConfig.page.section之后的单词，在这种情况下，它就是hotels.geo.city.US.united-states.14652.los-angeles，如图所示：

我使用rvest软件包尝试了几件事，但没有最终结果。请问，您对如何找到解决方案有任何建议吗？

非常感谢您。

Answer 1

只需逐行导入并执行grep。

# Reading line by line
mylines <- readLines("/path/to/file")

# Finding target line(s)
mytargetline <- mylines[grepl("pfDataConfig.page.section", mylines)]

# Stringsplit by "=" and extracting second element
mytarget <- unlist(strsplit(mytargetline, "="))[2]

R语言-从URL的源代码中截取数据

1 个答案: