如何使用read_excel有效读取和rbind一个文件夹中的所有.xlsx文件

时间:2019-11-07 15:05:50

标签: r xlsx rbind

我是R的新手,需要从80个.xlsx文件创建一个数据框,这些文件大多共享相同的列,并且都位于同一文件夹中。我想以一种在以后添加或删除文件夹中文件时可以使用的方式有效地绑定所有这些文件。我想这样做而不要将文件转换为.csv,除非有人可以向我展示如何有效地处理R本身中的大量文件。

我以前一直使用readxl软件包中的<h1>IGOR Country Data(AJAX-HTML) Prototype </h1> <hr /> <h2>Retrieving Country Data ...</h2> <button onClick="makeAjaxQuery()"> Region Info I (Format: region-fmt-1.xsl </button> <br><br> <button onClick="makeAjaxQuery()"> Region Info II (Format: region-fmt-2.xsl </button> <br><br> <button onClick="makeAjaxQuery()"> Country Info I (Format: country-fmt-1.xsl </button> <br><br> <button onClick="makeAjaxQuery()"> Country Info II (Format: country-fmt-2.xsl </button> <br><br> <button onClick="makeAjaxQuery()"> Population Info I (Format: population-fmt-1.xsl </button> <br><br> <button onClick="makeAjaxQuery()"> Population Info II (Format: population-fmt-2.xsl </button> <br><br> <hr / > <h2>Displaying Country Data ... </h2> <p id = 'display'></p> <script> function makeAjaxQuery() { var xhttp = new XMLHttpRequest(); xhttp.onreadystatechange = function() { readyStateChangeHandler(xhttp); }; xhttp.open("GET","A3_CtryData_dtd_sample.xml",true); xhttp.send(); } function readyStateChangeHandler(xhttp) { if (xhttp.readyState == 4) { if(xhttp.status == 200) { handleStatusSuccess(xhttp); } else { handleStatusFailure(xhttp); } } } function handleStatusFailure(xhttp) { var displayDiv = document.getElementById("display"); displayDiv.innerHTML = "XMLHttpRequest failed: status " + xhttp.status; } function handleStatusSuccess(xhttp) { var xml = xhttp.responseXML; var countryObj = parseXMLCountry(xml); displayCountry(countryObj); } function parseXMLCountry(xml) { var countryObj = {}; var countryListElement = xml.getElementsByTagName("CountryList")[0]; var countryRecordElement = countryListElement.getElementsByTagName("CountryRecord"); countryObj.countryRecord = parseCountryRecordElement(countryRecordElement); return countryObj; } function parseCountryRecordElement(countryRecordElement) { var countryRecord = []; for(var i=0; i < countryRecordElement.length; i++) { var countryElement = countryRecordElement[i]; var countryElementObj = parseCountryElement(countryElement); countryRecord.push(countryElementObj); } return countryRecord; } function parseCountryElement(countryElement) { var countryElementObj = {}; var nameElement = countryElement.getElementsByTagName("name")[0]; countryElementObj.name = nameElement.textContent; var alpha2Element = countryElement.getElementsByTagName("alpha-2")[0]; countryElementObj.alpha2 = alpha2Element.textContent; var alpha3Element = countryElement.getElementsByTagName("alpha-3")[0]; countryElementObj.alpha3 = alpha3Element.textContent; var countrycElement = countryElement.getElementsByTagName("country-code")[0]; countryElementObj.countryc = Number(countrycElement.textContent); var isoElement = countryElement.getElementsByTagName("iso_3166-2")[0]; countryElementObj.iso = isoElement.textContent; var regionElement = countryElement.getElementsByTagName("region")[0]; countryElementObj.region = regionElement.textContent; var srElement = countryElement.getElementsByTagName("sub-region")[0]; countryElementObj.sr = srElement.textContent; var irElement = countryElement.getElementsByTagName("intermediate-region")[0]; countryElementObj.ir = irElement.textContent; var rcElement = countryElement.getElementsByTagName("region-code")[0]; countryElementObj.rc = Number(rcElement.textContent); var srcElement = countryElement.getElementsByTagName("sub-region-code")[0]; countryElementObj.src = Number(srcElement.textContent); var ircElement = countryElement.getElementsByTagName("intermediate-region-code")[0]; countryElementObj.irc = Number(ircElement.textContent); var capitalElement = countryElement.getElementsByTagName("capital-city")[0]; countryElementObj.capital = capitalElement.textContent; var currencyElement = countryElement.getElementsByTagName("currency")[0]; countryElementObj.currency = currencyElement.textContent; var currencycElement = countryElement.getElementsByTagName("currency-code")[0]; countryElementObj.currencyc = Number(currencycElement.textContent); var popElement = countryElement.getElementsByTagName("population")[0]; countryElementObj.pop = Number(popElement.textContent); return countryElementObj; } function displayCountry(countryObj) { var html = ""; html += "<table border='1'>"; html += "<tr><th>Ctry-Code</th><th>Name</th><th>Alpha-2</th><th>Alpha-3</th><th>Capital-City</th></tr>"; for (var i=0;i<countryObj.countryRecord.length; i++) { var countryElementObj = countryObj.countryRecord[i]; html += "<tr>"; html += "<td style='text-align:center'>" + countryElementObj.countryc + "</td>"; html += "<td>" + countryElementObj.name + "</td>"; html += "<td style='text-align:center'>" + countryElementObj.alpha2 + "</td>"; html += "<td style='text-align:center'>" + countryElementObj.alpha3 + "</td>"; html += "<td>" + countryElementObj.capital + "</td>"; } var displayDiv = document.getElementById("display"); displayDiv.innerHTML = html; } 函数来分别读取文件。之后,我将使用read_excel来绑定它们。可以容纳10个文件,但不能容纳80个文件!我已经尝试了许多在线提供的解决方案,但是这些解决方案似乎都不起作用,主要是因为它们使用的是rbind以外的功能或.xlsx以外的格式。我没有跟踪许多失败的尝试,因此无法提供代码,除了我尝试从read_excel函数适应read_excel的另一种方法。

read_csv

任何有关如何执行此操作的代码将不胜感激。抱歉,如果这篇文章有什么问题,这是我的第一篇。

更新: 使用答案建议的更改,现在使用代码:

#Method 1
library(readxl)
library(purr)
library(dplyr)
library(tidyverse)
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map(read_excel) %>%
reduce(rbind)

#Output
New names:
* `` -> ...2
Error in rbind(deparse.level, ...) : 
numbers of columns of arguments do not match

现在的输出如下:

file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map_dfr(read_excel) %>%
reduce(bind_rows)

无论我在New names: * `` -> ...2 Error: Column `10.Alert.alone` can't be converted from numeric to character 插槽中使用哪种bind()函数,都会发生这种情况。如果有人可以帮助您,请告诉我!

3 个答案:

答案 0 :(得分:0)

这应该可以带您到那里/关闭...

library(data.table)
library(readxl)
#create files list
file.list <- list.files( pattern = ".*\\.xlsx$", full.names = TRUE )
#read files to list of data.frames
l <- lapply( l, readxl::read_excel )
#bind l together to one larger data.table, by columnname, fill missing with NA 
dt <- data.table::rbindlist( l, use.names = TRUE, fill = TRUE )

答案 1 :(得分:0)

您在这里的位置正确。但是您需要使用map_dfr而不是普通的mapmap_dfr为每次迭代输出一个数据帧(或实际上为小标题),并通过bind_rows对其进行组合。

这应该有效:

library(readxl)
library(tidyverse)
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
  map_dfr(~read_excel(.x))

请注意,这假设您的文件都具有一致的列名和数据类型。如果没有,您可能必须进行一些清洁。 (我在复杂情况下使用的一个技巧是在map函数内的%>% mutate_all(as.character)命令中添加read_excel。这会将一切转换为字符,然后您可以从那里转换数据类型。)

答案 2 :(得分:0)

尝试使用map_dfr。

alldata <- file.list %>%
map_dfr(read_excel)