R:通过导入和名称字符串自动连接2个文件

时间:2018-01-25 14:08:43

标签: r join

我正在尝试从2个不同的目录导入几个文件,每个目录包含~40个txt.files。导入时我想通过公共名称模式合并每个目录中的1个文件。更具体地说,我想使用循环 - 或任何其他方式 - 名称匹配自动完成。最后,我想将连接的文件写入.txt。 有什么办法吗?

到目前为止我得到了什么(主要是通过粘贴不同的东西):

ls<-list.files("C:/directory1....", pattern=".txt")
ls1<-list.files("C:/directory2....", pattern=".txt")

get<-list()

for(i in 1:length(ls)){
  import<-read.table(file=paste("C:/directory1...), ls[i], sep="")
  get[[gsub(".txt","",ls[i)]]<-import
} 

get2<-list()

for(i in 1:length(ls1)){
  import<-read.table(file=paste("C:/directory2...), ls[i], sep="")
  get2[[gsub(".txt","",ls[i)]]<-import
} 

......基本上就是这样,我不知道如何继续。非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

这是一个可能适合您的示例。由于问题中没有任何数据,我已经演示了如何使用我的一些本地文件:

path1 <- "./Bolt Exercise/"
path2 <- "./gmad/R/"

all_files <- c(paste(path1, dir(path1, "*.xlsx"), sep = ""),
              paste(path2, dir(path2, "*.R"), sep = ""))

common_pattern <- "SLDASM"

read_func <- function(z){
  cols <- c("item","qty","per_unit","description","dwg_no","drw","dimension",
            "length","material","weight","surface","revision")
  return(readxl::read_xlsx(z, skip = 10, col_names = cols))
}

lapply(grep(common_pattern, all_files, val = T),read_func)

<强>输出

> all_files
 [1] "./Bolt Exercise/001600211.SLDASM.F Grupo done.xlsx"  
 [2] "./Bolt Exercise/001600234.SLDASM.F fadhili done.xlsx"
 [3] "./Bolt Exercise/001600240.SLDASM.D Gulf done.xlsx"   
 [4] "./gmad/R/calcCorrProperties.R"                       
 [5] "./gmad/R/getWeatherData.R"                           
 [6] "./gmad/R/getWeatherForecast.R"                       
 [7] "./gmad/R/gmadPlot.R"                                 
 [8] "./gmad/R/listMerge.R"                                
 [9] "./gmad/R/listTimeSegments.R"                         


> grep(common_pattern, all_files, val = T)
[1] "./Bolt Exercise/001600211.SLDASM.F Grupo done.xlsx"   "./Bolt Exercise/001600234.SLDASM.F fadhili done.xlsx"
[3] "./Bolt Exercise/001600240.SLDASM.D Gulf done.xlsx" 

> lapply(grep(common_pattern, all_files, val = T),read_func)
[[1]]
# A tibble: 6,157 x 12
          item   qty per_unit                        description       dwg_no   drw
         <chr> <chr>    <chr>                              <chr>        <chr> <chr>
 1        Item   Qty per unit                        Description Art./Dwg.No.   Drw
 2         001     1        1 Pulse Air Intake System SGT6-8000H    100600415     X
 3     001.001     8        8                 Weather hood upper    121000398     X
 4 001.001.001     1        8                         Bent plate    950067012     X
 5 001.001.002     1        8                         Bent plate    950067015     X
 6 001.001.003     1        8                         Bent plate    950067016     X
 7 001.001.100     8       64                               Bolt   1701339-04  <NA>
 8 001.001.101     8       64                                Nut      1701340  <NA>
 9 001.001.102    16      128                             Washer      1700581  <NA>
10 001.001.103     1        8                   Sealing Compound      1001531  <NA>
# ... with 6,147 more rows, and 6 more variables: dimension <chr>, length <chr>, material <chr>, weight <chr>,
#   surface <chr>, revision <chr>

[[2]]
# A tibble: 5,100 x 12
                      item   qty per_unit                  description       dwg_no   drw  dimension length
                     <chr> <chr>    <chr>                        <chr>        <chr> <chr>      <chr>  <chr>
 1                    Item   Qty per unit                  Description Art./Dwg.No.   Drw  Dimension Length
 2                     001     1        1 Air Intake System SGT6-5000F    100600470     X       <NA>   <NA>
 3                 001.001     1        1            Support structure    134600414     X       <NA>   <NA>
 4             001.001.001     1        1        Support structure Top    930603648     X       <NA>   <NA>
 5         001.001.001.001     1        1                         Beam    930603660     X       <NA>   <NA>
 6     001.001.001.001.001     1        1                         Beam    930603831     X       <NA>   <NA>
 7 001.001.001.001.001.001     1        1                         Beam    970023245  <NA> HM500x300A  10304
 8 001.001.001.001.001.002     5        5                        Plate    970023246     X         16   <NA>
 9 001.001.001.001.001.003     1        1                        Plate    970023437     X         16   <NA>
10 001.001.001.001.001.004     1        1                        Plate    970023286     X         16   <NA>
# ... with 5,090 more rows, and 4 more variables: material <chr>, weight <chr>, surface <chr>, revision <chr>

[[3]]
# A tibble: 2,357 x 12
          item   qty per_unit       description       dwg_no   drw       dimension length        material
         <chr> <chr>    <chr>             <chr>        <chr> <chr>           <chr>  <chr>           <chr>
 1        Item   Qty per unit       Description Art./Dwg.No.   Drw       Dimension Length        Material
 2         001     1        1 Air Intake System    100600492     X            <NA>   <NA>            <NA>
 3     001.001     2        2      Weather hood    121600341     X            <NA>   <NA>            <NA>
 4 001.001.001     3        6        Bent plate    970025024     X               3   <NA> ASTM A572 Gr 50
 5 001.001.002     3        6        Bent plate    970025025     X               3   <NA> ASTM A572 Gr 50
 6 001.001.003     1        2        Bent plate    970025022     X               3   <NA> ASTM A572 Gr 50
 7 001.001.004     3        6        Bent plate    970025023     X               3   <NA> ASTM A572 Gr 50
 8 001.001.005     1        2        Bent plate    970025028     X               3   <NA> ASTM A572 Gr 50
 9 001.001.006     1        2        Bent plate    970025029     X               3   <NA> ASTM A572 Gr 50
10 001.001.100    24       48              Bolt      1001477  <NA> M10x20, ISO4017      0        8.8, HDG
# ... with 2,347 more rows, and 3 more variables: weight <chr>, surface <chr>, revision <chr>

如果您想将给定common_pattern的输出合并为一个data.table,则可以将lapply调用括在data.table::rbindlist(lapply(.....), use.names = T, fill = T)中 - 假设您具有相同的结构对于所有文件,如上例所示:

> head(data.table::rbindlist(lapply(grep(common_pattern, all_files, value = T), read_func), use.names = T, fill = T))
          item qty per_unit                        description       dwg_no drw dimension length       material
1:        Item Qty per unit                        Description Art./Dwg.No. Drw Dimension Length       Material
2:         001   1        1 Pulse Air Intake System SGT6-8000H    100600415   X        NA     NA             NA
3:     001.001   8        8                 Weather hood upper    121000398   X        NA     NA Galvanized G90
4: 001.001.001   1        8                         Bent plate    950067012   X        NA     NA Galvanized G90
5: 001.001.002   1        8                         Bent plate    950067015   X        NA     NA Galvanized G90
6: 001.001.003   1        8                         Bent plate    950067016   X        NA     NA Galvanized G90
     weight             surface revision
1:   Weight             Surface Revision
2: 244436.3             14125.6        C
3:       51                 4.8        A
4:       41  3.8879999999999999        A
5:      4.7 0.45200000000000001        B
6:      4.7 0.45200000000000001        A

对于多个common_patterns,您可以使用嵌套的lapply/sapply语句。