Question

我尝试将来自多个主题的数据（每个excel文件）合并到一个数据库中，使用R中的循环。每个工作表的结构相同（列名，行数等），但是文件名在每种情况下都包含创建它的日期和时间。每一个都明显不同，这意味着它很难进行循环。

以下是两个文件路径的示例：

主题1：＆＃34; Data / 001 / 001_behavData_12_Sep_1125.csv＆＃34;

主题2：＆＃34;数据/ 002 / 002_behavData_14_Sep_1342.csv＆＃34;

以下是我的循环现在的样子：

subjects = c("001","002","003","004","005"...)

for (i in subjects) {
    path = paste0(i, "/", i, "_behavData", ****, ".csv"}

****是11个字符的字符串，每个主题都不同。有没有办法告诉R忽略每个文件名的这一部分？在此先感谢您的任何帮助

Answer 1

您可以使用list.files允许提取具有相同格式的所有文件，例如.csv，然后您可以使用map_df read_csv函数：

list.files(getwd(), pattern = "*.csv")%>% 
map_df(read_csv())

注意：get_wd（）必须是文件的目录，我使用它，因为通常文件在你的工作目录中。

希望有所帮助！

Answer 2

列出文件的最简单方法是使用list.files()选项返回完整的文件名并递归子目录。

theList <- list.files("./data",pattern="csv",full.names=TRUE,recursive=TRUE)
theList

和输出：

 > 
 > theList
 [1] "./data/All Penalties.csv"                      
 [2] "./data/baseballPlayers.csv"                    
 [3] "./data/cameras.csv"                            
 [4] "./data/epa-extreme-precip.csv"      
  ...           
 [17] "./data/ontime/2016-11_T_ONTIME.csv"            
 [18] "./data/ontime/2016-12_T_ONTIMEcsv.csv"         
 [19] "./data/ontime/2017-01_T_ONTIME.csv"  
 ...          
 [43] "./data/week3q1.csv"                            
 >

要阅读文件，我们将列表用作lapply()中的参数。请注意，如果文件是CSV，我们会使用read.csv()代替openxlsx::read.xlsx()，这将用于阅读Excel电子表格。

theSheets <- lapply(theList,function(x) {
    read.csv(x,...) # add options at ellipsis, such as header=TRUE
})

然后使用do.call()和rbind()加入一个数据框。

theData <- do.call(rbind, theSheets)

使用最初由Alberto Barradas on kaggle.com收集的神奇宝贝统计数据的完整示例是：

# Example using Pokémon data files retrieved from kaggle.com and
# broken out into 6 csv files, one per generation
# raw data files available at https://github.com/lgreski/pokemonData

thePokemonFiles <- list.files("./pokedata",pattern="gen",
                              full.names=TRUE)
thePokemonFiles
pokemonData <- lapply(thePokemonFiles,function(x) read.csv(x))

combinedData <- do.call(rbind,pokemonData)

# sumarize the data 
summary(combinedData)

...和输出。

> thePokemonFiles <- list.files("./pokedata",pattern="gen",
+                               full.names=TRUE)
> thePokemonFiles
[1] "./pokedata/gen01.csv" "./pokedata/gen02.csv" "./pokedata/gen03.csv"
[4] "./pokedata/gen04.csv" "./pokedata/gen05.csv" "./pokedata/gen06.csv"
> pokemonData <- lapply(thePokemonFiles,function(x) read.csv(x))
> 
> combinedData <- do.call(rbind,pokemonData)
>
> # sumarize the data 
> summary(combinedData)
     Number                             Name         Type1          Type2    
 Min.   :  1.0   Abra                     :  1   Water  :112           :385  
 1st Qu.:185.5   Aerodactyl               :  1   Normal : 98   Flying  : 97  
 Median :365.0   AerodactylMega Aerodactyl:  1   Grass  : 70   Ground  : 35  
 Mean   :363.1   Alakazam                 :  1   Bug    : 69   Poison  : 34  
 3rd Qu.:539.5   AlakazamMega Alakazam    :  1   Psychic: 56   Psychic : 33  
 Max.   :721.0   Arbok                    :  1   Fire   : 52   Fighting: 26  
                 (Other)                  :793   (Other):342   (Other) :189  
...

更新： 18Dec2017 - 要将ID与每个数据文件相关联，可以按如下方式解析文件名。

pokemonData <- lapply(thePokemonFiles,function(x) {
     data <- read.csv(x)
     tokens <- unlist(strsplit(x,"/"))
     data$source <- substr(tokens[3],4,5)
     data
     })

combinedData <- do.call(rbind,pokemonData)
head(combinedData)

...和输出。

> pokemonData <- lapply(thePokemonFiles,function(x) {
+      data <- read.csv(x)
+      tokens <- unlist(strsplit(x,"/"))
+      data$source <- substr(tokens[3],4,5)
+      data
+      })
> 
> combinedData <- do.call(rbind,pokemonData)
> head(combinedData)
  Number                  Name Type1  Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed Generation Legendary source
1      1             Bulbasaur Grass Poison   318 45     49      49         65         65    45          1     False     01
2      2               Ivysaur Grass Poison   405 60     62      63         80         80    60          1     False     01
3      3              Venusaur Grass Poison   525 80     82      83        100        100    80          1     False     01
4      3 VenusaurMega Venusaur Grass Poison   625 80    100     123        122        120    80          1     False     01
5      4            Charmander  Fire          309 39     52      43         60         50    65          1     False     01
6      5            Charmeleon  Fire          405 58     64      58         80         65    80          1     False     01
>

将excel文件与R中的异构文件名合并

2 个答案: