for循环使用url地址中的日期

时间:2018-04-09 16:11:09

标签: r for-loop

我的网址如下:http://nationalbank.kz/?docid=105&cmomdate=2018-01-03&switch=english

我想循环从2015年开始的所有日期并将数据存储在数据框中。如果我运行以下内容,我会收到错误:

StartDate <- "2017-07-01"
EndDate <- "2017-07-10"
dates <- seq(as.Date(StartDate, format="%Y-%m-%d"),
             as.Date(EndDate, format="%Y-%m-%d"), by='days')

ML = list()

for (date in dates) {
  url = paste0("http://nationalbank.kz/?docid=105&cmomdate=",
               as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),
               "&switch=english")
  p <- url %>%
    read_html() %>%
    html_nodes(xpath='//table[1]') %>%
    html_table(fill = T)
  dt = p[[11]]
  tdt = as.data.frame(dt)

  ML[[date]] = tdt
}

all = do.call(rbind, ML)
all

错误消息为Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match

但是当我只运行1个日期时,似乎工作正常:

url <- "http://nationalbank.kz/?docid=105&cmomdate=20187-07-01&switch=english"

p <- url %>%
  read_html() %>%
  html_nodes(xpath='//table[1]') %>%
  html_table(fill = T)
dt = p[[11]]
tdt = t(dt)
tdt

ML = list()

for (i in 1:3) {
  ML[[i]] = tdt
}

all = do.call(rbind, ML)
all

输出结果为:

   [,1]               [,2]           [,3]       [,4]               
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
X1 "Type of security" "NIN"          "Maturity" "Type of placement"
X2 "Notes NBK"        "KZW1KD072398" "7 day"    "Auction"          
   [,5]                [,6]              [,7]             
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
X1 "Date of placement" "Settlement date" "Redemption date"
X2 "09.04.2018"        "09.04.2018"      "16.04.2018"     
   [,8]                         [,9]                      
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
X1 "Actual amount of placement" ""                        
X2 "339 999 999 929.33 tenge"   "3 405 587 268 (quantity)"
   [,10]                      [,11]                     
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
X1 "Demand"                   ""                        
X2 "366 198 211 200.00 tenge" "3 668 000 000 (quantity)"
   [,12]                     [,13]           [,14]           
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %"      
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %"      
X1 "Weighted-averaged price" "Cut price"     "Yield (coupon)"
X2 "99.8359 tenge"           "99.8359 tenge" "8.5707 %" 

我之前的代码出了什么问题?

2 个答案:

答案 0 :(得分:1)

看起来问题是网页返回一个格式不一致的页面,所以当你调用p [[11]]时,它不会返回一致的信息,反过来又会在尝试rbind时抛出错误大小的数据框架。下面的代码使用插入的print()突出显示此问题,该print()显示分配给'p'的list()的日期和可变长度。抛出的日期是'2008-04-04' - 下面的修复只检查列表长度是否为14,如果是,则将其添加到ML; do.call到rbind然后按预期连接这些。

library(rvest)
StartDate <- "2017-07-01"
EndDate <- "2017-07-10"
dates <- seq(as.Date(StartDate, format="%Y-%m-%d"),
             as.Date(EndDate, format="%Y-%m-%d"), by='days')

ML = list()

date <-
for (date in dates) {
  url = paste0("http://nationalbank.kz/?docid=105&cmomdate=",
               as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),
               "&switch=english")
  p <- url %>%
    read_html() %>%
    html_nodes(xpath='//table[1]') %>%
    html_table(fill = T)

  print(paste(as.Date(date, format="%Y-%m-%d", origin = "1960-10-01"),length(p)))

  if(length(p) == 14) {
  dt = p[[11]]
  tdt = as.data.frame(dt)

  ML[[date]] = tdt
  }
}

all = do.call(rbind, ML)
all

答案 1 :(得分:0)

显然,在 for (date in seq_along(dates)) { di = dates[date] url = paste0("http://nationalbank.kz/?docid=105&cmomdate=", as.Date(di, format="%Y-%m-%d"), "&switch=english") 中使用日期并不是一个好主意。因此,我做了以下修改:

length(p) == 14

另外,@ Soren提到检查是否p。这确实有帮助。但是检查length(p)长度并不重要,因为页面可能根本不包含表格。我决定检查nrow(dt) == 14,而不是检查ML。如表中恰好有14行,则将数据存储到列表M:\>exp mike/lion@orcl file=mike.dmp Export: Release 11.2.0.2.0 - Production on Uto Tra 10 07:11:42 2018 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Tes Export done in EE8MSWIN1250 character set and AL16UTF16 NCHAR character set . exporting pre-schema procedural objects and actions . exporting foreign function library names for user MIKE . exporting PUBLIC type synonyms . exporting private type synonyms . exporting object type definitions for user MIKE About to export MIKE's objects ... . exporting database links . exporting sequence numbers . exporting cluster definitions . about to export MIKE's tables via Conventional Path ... . . exporting table DEPT 4 rows exported . exporting synonyms . exporting views . exporting stored procedures . exporting operators . exporting referential integrity constraints . exporting triggers . exporting indextypes . exporting bitmap, functional and extensible indexes . exporting posttables actions . exporting materialized views . exporting snapshot logs . exporting job queues . exporting refresh groups and children . exporting dimensions . exporting post-schema procedural objects and actions . exporting statistics Export terminated successfully without warnings. M:\>imp scott/tiger@orcl file=mike.dmp full=y Import: Release 11.2.0.2.0 - Production on Uto Tra 10 07:13:51 2018 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Tes Export file created by EXPORT:V11.02.00 via conventional path Warning: the objects were exported by MIKE, not by you import done in EE8MSWIN1250 character set and AL16UTF16 NCHAR character set . importing MIKE's objects into SCOTT . . importing table "DEPT" 4 rows imported Import terminated successfully without warnings. M:\>

很高兴看到更强大的解决方案。