Question

我在各种子文件夹中都有不同命名模式的文件（例如：yyyy-mm-dd_random_FAM_random.txt）。

我已经隔离了我需要的文件路径：

path_to_files <- "/path/to/files/with/subfolders/
list_of_files <- list.files(path=path_to_files,
                            recursive=T, 
                            pattern='201[0-9]-.*(FAM|SRY|STD|VIC).*\\.txt', 
                            full.names=T)

我只需要那些在每个子文件夹中只有1个文件包含“FAM”，“SRY”，“STD”，“VIC”的路径。所以我想搜索所有类似的字符串，除了。*（FAM | SRY | STD | VIC）。*部分。在每个子文件夹中。

如果path_to_files的部分内容如下：

[1] "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_FAM_random.txt"
[2] "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_SRY_random.txt"
[3] "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_STD_random.txt"
[4] "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_VIC_random.txt"
[5] "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_VIC_random-differs.txt"

我想省略包含

的所有字符串

"/path/to/files/with/subfolders/subfolder.n/"

如何用R？

解决这个问题

Answer 1

假设您的所有字符串都遵循相同的格式，那么简单地说，

x <- c("/path/to/files/with/subfolders/subfolder1/yyyy-mm-dd_random_FAM_random.txt", "/path/to/files/with/subfolders/subfolder.n/yyyy-mm-dd_random_SRY_random.txt")
x[!grepl(".n/", x)]
#[1] "/path/to/files/with/subfolders/subfolder1/yyyy-mm-dd_random_FAM_random.txt"

根据您的赞扬，这是您需要的吗？

str_extract(x, '/([^/]*)$')
#[1] "/yyyy-mm-dd_random_FAM_random.txt" "/yyyy-mm-dd_random_SRY_random.txt"

Answer 2

我解决了：

## 1. get vector of files to be used
path_to_files <- "J:/Diagnostik/cffDNA-LightCycler/LC-Läufe 2016"
list_of_files <- list.files(path=path_to_files,
                        recursive=T,
                        pattern='201[0-9]-.*(FAM|SRY|STD|VIC).*.txt',
                        full.names=T)

list_of_files.df <- data.frame(path_filename = list_of_files, 
                               path = dirname(list_of_files), 
                               file = basename(list_of_files))

# identify all folders where there are n*4 files with pattern: '201[0-9]-.*(FAM|SRY|STD|VIC).*.txt' 
library(dplyr)
complete_folders <- subset(summarize(group_by(list_of_files.df, path), 
                      count= n(), modulus.4 = (count %% 4 == 0)), modulus.4 == TRUE)
complete_folders <- as.character(complete_folders$path)

如果有4个不同的文件可用，则在矢量中隔离文本（路径）

2 个答案: