Question

我有一个像这样的字符串列表（省略了58 * 5个案例）：

participant_01_Bullpup_1.xml
participant_01_Bullpup_2.xml
participant_01_Bullpup_3.xml
participant_01_Bullpup_4.xml
participant_01_Bullpup_5.xml
#...Through to...
participant_60_Bullpup_1.xml
participant_60_Bullpup_2.xml
participant_60_Bullpup_3.xml
participant_60_Bullpup_4.xml
participant_60_Bullpup_5.xml

我想在这些上使用gsub以便最终得到（仅限示例）：

01_1
60_5

目前，我的代码如下：

fileNames <- Sys.glob("part*.csv")

for (fileName in fileNames) {
    sample <- read.csv(fileName, header = FALSE, sep = ",")
    part   <- gsub("[^0-9]+", "", substring(fileName, 5, last = 1000000L))
    print(part)
}

这导致以下字符串（示例）：

011
605

但是，我无法弄清楚如何在这些字符串之间保留一个下划线。

Answer 1

尝试

sub('[^0-9]+_([0-9]+_).*([0-9]+).*', '\\1\\2', str1)
#[1] "01_1"

library(stringr)
sapply(str_extract_all(str1, '\\d+'), paste, collapse='_')

数据

str1 <- 'participant_01_Bullpup_1.xml'

Answer 2

以下是一些选项（使用akrun＆＃39; s str1）：

gsub("[^0-9_]+|(?<=\\D)_", "", str1, perl=TRUE)
#[1] "01_1"
sub(".+?(\\d+_).+?(\\d+).+", "\\1\\2", str1, perl=TRUE)
#[1] "01_1"
sub(".+?(\\d+).+?(\\d+).+", "\\1_\\2", str1, perl=TRUE)
#[1] "01_1"
paste(strsplit(str1, "\\D+")[[1]][-1], collapse="_")
#[1] "01_1"

如果你的模式确实是一致的（即在第一个数字之前有12个字符，接着是8个字符，直到下一组数字，接着是4个以上的字符），那么你可以明确地使用量词：

sub(".{12}(\\d+_).{8}(\\d+).{4}", "\\1\\2", str1)
#[1] "01_1"

或只是访问字符使用适当的索引：

paste0(substr(str1, 13, 15), substr(str1, 24, 24))
#[1] "01_1"

使用gsub从字符串中删除部分模式

2 个答案:

数据