对于带有XML包的readHTMLTable的循环

时间:2017-05-02 04:12:12

标签: r xml for-loop

我正在尝试使用for循环从多个网址中提取数据。问题是,我需要在不同的表中找到的数据。我原来的问题是here。我有的初步数据:

 Code Issuer         ISIN           Type                                          URL
1 NTK007_1915   NBRK KZW1KD079153 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1915
2 NTK007_1917   NBRK KZW1KD079179 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1917
3 NTK007_1918   NBRK KZW1KD079187 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1918
4 NTK028_1896   NBRK KZW1KD288960 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1896
5 NTK028_1903   NBRK KZW1KD289034 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1903
6 NTK028_1909   NBRK KZW1KD289091 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1909

我一直在尝试这段代码:

wanted <- c("Nominal value in issue's currency" = "Nominal Value",
            "Number of bonds outstanding" = "# of Bonds Issue")

# function returns a data frame of wanted columns for given URL
getValues1 <- function (name, url) {
  # get the table and rename columns
  sp = readHTMLTable(url, stringsAsFactors = FALSE)
  df <- sp[[4]]
  names(df) <- c("full_name", "value")

  # filter and remap wanted columns
  result <- df[df$full_name %in% names(wanted),]
  result$column_name <- sapply(result$full_name, function(x) {wanted[[x]]})

  # add the identifier to every row
  result$name <- name
  return (result[,c("name", "column_name", "value")])
}

getValues2 <- function (name, url) {
  # get the table and rename columns
  sp = readHTMLTable(url, stringsAsFactors = FALSE)
  df <- sp[[7]]
  names(df) <- c("full_name", "value")

  # filter and remap wanted columns
  result <- df[df$full_name %in% names(wanted),]
  result$column_name <- sapply(result$full_name, function(x) {wanted[[x]]})

  # add the identifier to every row
  result$name <- name
  return (result[,c("name", "column_name", "value")])
}

# invoke function for each name/URL pair - returns list of data frames
for (i in 1:length(newd$URL)) {
    sp = readHTMLTable(newd$URL[[i]])
    if (dim(sp[[4]])[[2]] = 2) {
        columns = getValues1(x[["name"]], x[["URL"]])
    } else {
        columns = getValues2(x[["name"]], x[["URL"]])
    }
print (columns)
}

所以,基本上我正在查看表的列数,如果它不等于2,那么从另一个表中获取数据。到目前为止,R发出以下错误:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’

请帮忙。

0 个答案:

没有答案