在list.files中对谱图文件进行子集化所需的简单解决方案

时间:2017-10-24 23:41:39

标签: r list

我有一个充满光谱文件的文件夹。文件的数量可以根据不同的测量值和重复而变化。 到目前为止,它有效:

files <- list.files(pattern = "^Q\\d+")
print(files)

print(list)给出:

[1] "Q010101N.001" "Q010101N.002" "Q010101N.003" "Q010101N.004" "Q010101N.005" "Q010101N.006"
  [7] "Q010101N.007" "Q010101N.008" "Q010101N.009" "Q010101N.010" "Q010101N.011" "Q010101N.012"
 [13] "Q010101N.013" "Q010101N.014" "Q010101N.015" "Q010101N.016" "Q010101N.017" "Q010101N.018"
 [19] "Q010101N.019" "Q010101N.020" "Q010101N.021" "Q010101N.022" "Q010101N.023" "Q010101N.024"
 [25] "Q010101N.025" "Q021101N.001" "Q021101N.002" "Q021101N.003" "Q021101N.004" "Q021101N.005"
 [31] "Q021101N.006" "Q021101N.007" "Q021101N.008" "Q021101N.009" "Q021101N.010" "Q021101N.011"
 [37] "Q021101N.012" "Q021101N.013" "Q021101N.014" "Q021101N.015" "Q021101N.016" "Q021101N.017"
 [43] "Q021101N.018" "Q021101N.019" "Q021101N.020" "Q021101N.021" "Q021101N.022" "Q021101N.023"
 [49] "Q021101N.024" "Q021101N.025" "Q031201N.001" "Q031201N.002" "Q031201N.003" "Q031201N.004"
 [55] "Q031201N.005" "Q031201N.006" "Q031201N.007" "Q031201N.008" "Q031201N.009" "Q031201N.010"
 [61] "Q031201N.011" "Q031201N.012" "Q031201N.013" "Q031201N.014" "Q031201N.015" "Q031201N.016"
 [67] "Q031201N.017" "Q031201N.018" "Q031201N.019" "Q031201N.020" "Q031201N.021" "Q031201N.022"
 [73] "Q031201N.023" "Q031201N.024" "Q031201N.025" "Q041301N.001" "Q041301N.002" "Q041301N.003"
 [79] "Q041301N.004" "Q041301N.005" "Q041301N.006" "Q041301N.007" "Q041301N.008" "Q041301N.009"
 [85] "Q041301N.010" "Q041301N.011" "Q041301N.012" "Q041301N.013" "Q041301N.014" "Q041301N.015"
 [91] "Q041301N.016" "Q041301N.017" "Q041301N.018" "Q041301N.019" "Q041301N.020" "Q041301N.021"
 [97] "Q041301N.022" "Q041301N.023" "Q041301N.024" "Q041301N.025" "Q051401N.001" "Q051401N.002"
[103] "Q051401N.003" "Q051401N.004" "Q051401N.005" "Q051401N.006" "Q051401N.007" "Q051401N.008"
[109] "Q051401N.009" "Q051401N.010" "Q051401N.011" "Q051401N.012" "Q051401N.013" "Q051401N.014"
[115] "Q051401N.015" "Q051401N.016" "Q051401N.017" "Q051401N.018" "Q051401N.019" "Q051401N.020"
[121] "Q051401N.021" "Q051401N.022" "Q051401N.023" "Q051401N.024" "Q051401N.025" "Q061501N.001"
[127] "Q061501N.002" "Q061501N.003" "Q061501N.004" "Q061501N.005" "Q061501N.006" "Q061501N.007"
[133] "Q061501N.008" "Q061501N.009" "Q061501N.010" "Q061501N.011" "Q061501N.012" "Q061501N.013"
[139] "Q061501N.014" "Q061501N.015" "Q061501N.016" "Q061501N.017" "Q061501N.018" "Q061501N.019"
[145] "Q061501N.020" "Q061501N.021" "Q061501N.022" "Q061501N.023" "Q061501N.024" "Q061501N.025"
[151] "Q071601N.001" "Q071601N.002" "Q071601N.003" "Q071601N.004" "Q071601N.005" "Q071601N.006"
[157] "Q071601N.007" "Q071601N.008" "Q071601N.009" "Q071601N.010" "Q071601N.011" "Q071601N.012"
[163] "Q071601N.013" "Q071601N.014" "Q071601N.015" "Q071601N.016" "Q071601N.017" "Q071601N.018"
[169] "Q071601N.019" "Q071601N.020" "Q071601N.021" "Q071601N.022" "Q071601N.023" "Q071601N.024"
[175] "Q071601N.025" "Q081701N.001" "Q081701N.002" "Q081701N.003" "Q081701N.004" "Q081701N.005"
[181] "Q081701N.006" "Q081701N.007" "Q081701N.008" "Q081701N.009" "Q081701N.010" "Q081701N.011"
[187] "Q081701N.012" "Q081701N.013" "Q081701N.014" "Q081701N.015" "Q081701N.016" "Q081701N.017"
[193] "Q081701N.018" "Q081701N.019" "Q081701N.020" "Q081701N.021" "Q081701N.022" "Q081701N.023"
[199] "Q081701N.024" "Q081701N.025" "Q091801N.001" "Q091801N.002" "Q091801N.003" "Q091801N.004"
[205] "Q091801N.005" "Q091801N.006" "Q091801N.007" "Q091801N.008" "Q091801N.009" "Q091801N.010"
[211] "Q091801N.011" "Q091801N.012" "Q091801N.013" "Q091801N.014" "Q091801N.015" "Q091801N.016"
[217] "Q091801N.017" "Q091801N.018" "Q091801N.019" "Q091801N.020" "Q091801N.021" "Q091801N.022"
[223] "Q091801N.023" "Q091801N.024" "Q091801N.025" "Q101901N.001" "Q101901N.002" "Q101901N.003"
[229] "Q101901N.004" "Q101901N.005" "Q101901N.006" "Q101901N.007" "Q101901N.008" "Q101901N.009"
[235] "Q101901N.010" "Q101901N.011" "Q101901N.012" "Q101901N.013" "Q101901N.014" "Q101901N.015"
[241] "Q101901N.016" "Q101901N.017" "Q101901N.018" "Q101901N.019" "Q101901N.020" "Q101901N.021"
[247] "Q101901N.022" "Q101901N.023" "Q101901N.024" "Q101901N.025" "Q112001N.001" "Q112001N.002"
[253] "Q112001N.003" "Q112001N.004" "Q112001N.005" "Q112001N.006" "Q112001N.007" "Q112001N.008"
[259] "Q112001N.009" "Q112001N.010" "Q112001N.011" "Q112001N.012" "Q112001N.013" "Q112001N.014"
[265] "Q112001N.015" "Q112001N.016" "Q112001N.017" "Q112001N.018" "Q112001N.019" "Q112001N.020"
[271] "Q112001N.021" "Q112001N.022" "Q112001N.023" "Q112001N.024" "Q112001N.025" "Q124101N.001"
[277] "Q124101N.002" "Q124101N.003" "Q124101N.004" "Q124101N.005" "Q124101N.006" "Q124101N.007"
[283] "Q124101N.008" "Q124101N.009" "Q124101N.010" "Q124101N.011" "Q124101N.012" "Q124101N.013"
[289] "Q124101N.014" "Q124101N.015" "Q124101N.016" "Q124101N.017" "Q124101N.018" "Q124101N.019"
[295] "Q124101N.020" "Q124101N.021" "Q124101N.022" "Q124101N.023" "Q124101N.024" "Q124101N.025"
[301] "Q134201N.001" "Q134201N.002" "Q134201N.003" "Q134201N.004" "Q134201N.005" "Q134201N.006"
[307] "Q134201N.007" "Q134201N.008" "Q134201N.009" "Q134201N.010" "Q134201N.011" "Q134201N.012"
[313] "Q134201N.013" "Q134201N.014" "Q134201N.015" "Q134201N.016" "Q134201N.017" "Q134201N.018"
[319] "Q134201N.019" "Q134201N.020" "Q134201N.021" "Q134201N.022" "Q134201N.023" "Q134201N.024"
[325] "Q134201N.025" "Q144301N.001" "Q144301N.002" "Q144301N.003" "Q144301N.004" "Q144301N.005"
[331] "Q144301N.006" "Q144301N.007" "Q144301N.008" "Q144301N.009" "Q144301N.010" "Q144301N.011"
[337] "Q144301N.012" "Q144301N.013" "Q144301N.014" "Q144301N.015" "Q144301N.016" "Q144301N.017"
[343] "Q144301N.018" "Q144301N.019" "Q144301N.020" "Q144301N.021" "Q144301N.022" "Q144301N.023"
[349] "Q144301N.024" "Q144301N.025" "Q154401N.001" "Q154401N.002" "Q154401N.003" "Q154401N.004"
[355] "Q154401N.005" "Q154401N.006" "Q154401N.007" "Q154401N.008" "Q154401N.009" "Q154401N.010"
[361] "Q154401N.011" "Q154401N.012" "Q154401N.013" "Q154401N.014" "Q154401N.015" "Q154401N.016"
[367] "Q154401N.017" "Q154401N.018" "Q154401N.019" "Q154401N.020" "Q154401N.021" "Q154401N.022"
[373] "Q154401N.023" "Q154401N.024" "Q154401N.025" "Q164501N.001" "Q164501N.002" "Q164501N.003"
[379] "Q164501N.004" "Q164501N.005" "Q164501N.006" "Q164501N.007" "Q164501N.008" "Q164501N.009"
[385] "Q164501N.010" "Q164501N.011" "Q164501N.012" "Q164501N.013" "Q164501N.014" "Q164501N.015"
[391] "Q164501N.016" "Q164501N.017" "Q164501N.018" "Q164501N.019" "Q164501N.020" "Q164501N.021"
[397] "Q164501N.022" "Q164501N.023" "Q164501N.024" "Q164501N.025" "Q174601N.001" "Q174601N.002"
[403] "Q174601N.003" "Q174601N.004" "Q174601N.005" "Q174601N.006" "Q174601N.007" "Q174601N.008"
[409] "Q174601N.009" "Q174601N.010" "Q174601N.011" "Q174601N.012" "Q174601N.013" "Q174601N.014"
[415] "Q174601N.015" "Q174601N.016" "Q174601N.017" "Q174601N.018" "Q174601N.019" "Q174601N.020"
[421] "Q174601N.021" "Q174601N.022" "Q174601N.023" "Q174601N.024" "Q174601N.025"

因此,在这种情况下,我得到425个光谱文件,每个样本重复25次。然而,文件的总数可能在另一个时间不同,也可能是一个样本有10次重复,其余的例如14个。 所以我想将每个样本分组(重复一个子集)。在这种情况下,我会得到17个子集。 我需要导入文件,我之前已经成功完成了所有光谱文件:

list.data <- list()

#import all spectra files
 for (i in 1:length(files))
    list.data[[i]] <- read.csv(files[i])

鉴于我现在有子集,那会有些不同!?

1 个答案:

答案 0 :(得分:0)

您可以通过辅助函数和迭代来完成此操作。我使用了dplyrpurrrstringi。这会将您的所有文件放入一个数据帧中。之后,您可以按照自己的意愿操纵它。

library(dplyr)
library(purrr)
library(stringi)

read_spectra <- function(file){

  file_name <- basename(file)

  read.csv(file) %>% 
    mutate(sample = stri_extract_first_regex(file_name, "([A-Z][0-9]+)(?=.)"),
           repetition = stri_extract_first_regex(file_name, "(?<=\\.)(\\d+)")) %>%
    select(sample, repetition, everything())

}

full_data <- map_df(files, read_spectra)

辅助函数:

  1. list.files获取文件。
  2. 阅读csv。
  3. 使用mutate使用正则表达式创建两个新列以提取样本编号和重复。
  4. 将列排序为样本,重复以及其他所有内容。
  5. 迭代使用map_df()中的purrrread_spectra中的每个文件进行迭代files,并将所有这些文件绑定到一个数据框中。