我无法将*.xls
个文件中的数据读入R.我尝试使用readxl::read_xls()
从以下网址读取Microsoft Excel文件中的数据:https://www.misoenergy.org/Library/Repository/Market%20Reports/20171114_5min_exante_lmp.xls 。我在R版本3.4.1(单烛)上,sessionInfo()
的输出粘贴在这篇文章的最底部。
该文件有6张包含数据的表格。作为一个最小的例子,考虑阅读名为RT Ex-Ante 5 Minute LMPs(1)
的第二张表。下面的代码是我第一次尝试阅读这张表:
library(readxl)
fpath <- '/Users/bmosovsky/Downloads/20171114_5min_exante_lmp.xls'
data <- read_excel( path=fpath, sheet=2, col_names=FALSE )
这允许read_excel猜测要读取的数据范围和列类型。我收到了警告信息,
Warning message:
In read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting logical in B65535 / R65535C2: got 'IPL.CC.IPLEV01'
和str(data)
返回
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: POSIXct, format: "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" ...
$ X__2: logi NA NA NA NA NA NA ...
$ X__3: logi NA NA NA NA NA NA ...
$ X__4: logi NA NA NA NA NA NA ...
$ X__5: logi NA NA NA NA NA NA ...
$ X__6: logi NA NA NA NA NA NA ...
认为read_excel()
可能只是错误地猜测了列类型,然后我尝试了:
data1 <- read_excel( path=fpath, sheet=2, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
这消除了警告,因为列被正确输入,但我仍然获得除第一列之外的所有列的NA
值。这次str(data1)
返回了
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr NA NA NA NA ...
$ X__3: num NA NA NA NA NA NA NA NA NA NA ...
$ X__4: num NA NA NA NA NA NA NA NA NA NA ...
$ X__5: num NA NA NA NA NA NA NA NA NA NA ...
$ X__6: num NA NA NA NA NA NA NA NA NA NA ...
最后,我尝试将Excel文件的第二页中的前10行数据(格式和全部)粘贴到新的Excel工作簿中,保存为test.xls
,然后尝试以下操作:
fpath_test <- '/Users/bmosovsky/Downloads/test.xls'
data_test <- read_excel( path=fpath_test, sheet=1, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
现在str(data_test)
会返回正确的结果:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr "CIN.MARKLND.3" "CIN.MIAMWAB.1" "CIN.MIAMWAB.2" "CIN.MIAMWAB.3" ...
$ X__3: num 22.4 22.6 22.6 22.6 22.5 ...
$ X__4: num 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6
$ X__5: num 0.8 1.02 1.02 1.02 0.92 0.93 1.29 1.29 1.29 0.06
$ X__6: num 0.04 0.01 0.01 0.01 0.01 0.01 0.05 0.05 0.05 0.06
所以,我的问题是,下载的Excel文件有什么独特之处,不允许将数据正确读入R?我试图将此数据作为自动数据收集过程的一部分进行读取,因此任何类型的Excel文件的手动操作都不可能作为解决方法。任何人都可以提供一些见解,了解如何将.xls
文件的所有表格中的数据导入R进行处理?
以下是sessionInfo()
的输出:
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 rvest_0.3.2 xml2_1.1.1 RPostgreSQL_0.6-2 DBI_0.7-12 lubridate_1.6.0 dplyr_0.7.2 readxl_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12.3 tidyr_0.6.3 assertthat_0.2.0 cellranger_1.1.0 R6_2.2.2 magrittr_1.5 httr_1.2.1 rlang_0.1.1 stringi_1.1.5
[10] curl_2.8.1 stringr_1.2.0 glue_1.1.1 compiler_3.4.1 pkgconfig_2.0.1 bindr_0.1 tibble_1.3.3