Question

我正在尝试从Yahoo Finance中提取财务数据。我跑步时出现错误

“ tokenize（css）中的错误：在位置19处发现意外字符'$'”

urlYCashflow <- "https://au.finance.yahoo.com/quote/MSFT/cash-flow?p=MSFT"
webpageYCashflow <- read_html(urlYCashflow)
node1 <- webpageYCashflow %>%
      html_nodes('D(tbr).fi-row.Bgc($hoverBgColor):h') %>%
      html_text()

有什么办法可以通过在XML文档中替换$或其他任何建议来避免$？我也尝试了xpath标签，但是每次结果都是character（0）。

    node1 <- webpageYCashflow %>%
      html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[3]/div[1]/div/div[2]/div[7]/div[2]/div[3]/div[1]/div[2]/span') %>%
      html_text()

Answer 1

您特别追求哪个值？您可以使用以下命令：https://stackoverflow.com/a/58337027/6241235获取所有值。

当前，您的css选择器在语法上是不正确的，尤其是使用未转义的$和：h，后者分别以operator和隐式伪选择器结尾。在编译时，这就是它们的解释方式。您还缺少领先的班级选择者。您可以简单地用单个类名.fi-row替换多值类以获取行。

要匹配您的xpath，您只需选择最后一行，然后选择第二列：

library(rvest)
library(magrittr)

page <- read_html('https://au.finance.yahoo.com/quote/MSFT/cash-flow?p=MSFT')
free_cash_flow <- tail(page%>%html_nodes('.fi-row'),1)%>%html_nodes('span')%>%`[[`(2)%>%html_text()

R html_nodes（）函数给出错误意外字符'$'

1 个答案: