我有一组字符串,应根据其分为四个批次
字符串的最后一部分(S1-S2等)。由于空格原因,我不得不删除部分t
:
> t
[1] "001_A01_S1" "001_A02_S2" "001_A03_S3" "001_A04_S4" "001_A05_S5" "001_A06_S6" "001_A07_S49"
[8] "001_A08_S50" "001_A09_S51" "001_A10_S52" "001_A11_S53" "001_A12_S54" "001_B01_S7" "001_B02_S8"
[15] "001_B03_S9" "001_B04_S10" "001_B05_S11" "001_B06_S12" "001_B07_S55" "001_B08_S56" "001_B09_S57"
[22] "001_B10_S58" "001_B11_S59" "001_B12_S60" "001_C01_S13" "001_C02_S14" "001_C03_S15" "001_C04_S16"
[29] "001_C05_S17" "001_C06_S18" "001_C07_S61" "001_C08_S62" "001_C09_S63" "001_C10_S64" "001_C11_S65"
[36] "001_C12_S66" "001_D01_S19" "001_D02_S20" "001_D03_S21" "001_D04_S22" "001_D05_S23" "001_D06_S24"
[43] "001_D07_S67" "001_D08_S68" "001_D09_S69" "001_D10_S70"
我想将它们分为四批:
Batch1: S1-S48
Batch2: S49-S96
batch3: S97-S144
Batch4: S145-S192
这就是我的尝试:
batch <- y
batch[grep("S([1-9]|[1-3].|4[0-8])_", batch)] <- "B1"
batch[grep("S([5-8].|49|9[0-6])_", batch)] <- "B2"
batch[grep("S(1[0-3].|14[0-4]|9[7-9])_", batch)] <- "B3"
batch[!grepl("^B", batch)] <- "B4"
答案 0 :(得分:4)
您可以先在1
,2
中提取字符串最后一部分的数字部分(即3
,S1
,S2
,S3
等。然后,使用此功能,您可以使用cut
进行分类。
示例强>
## Some sample data:
t <- c("001_A01_S1", "001_A02_S2", "001_A03_S3",
"001_A07_S49", "001_A08_S50", "001_A09_S51",
"001_C01_S110", "001_C02_S114", "001_C02_S128",
"001_C01_S155", "001_C02_S159", "001_C02_S162")
## Extract numeric part of "SXXX"
sNumericVec <- as.numeric(stringr::str_extract(t, "(?<=_S)[[:digit:]]*"))
## Categorize:
catVec <- cut(sNumericVec, breaks = c(0,48,96,144,192))
## Rename levels:
levels(catVec) <- paste0("B", 1:4)
catVec
# [1] B1 B1 B1 B2 B2 B2 B3 B3 B3 B4 B4 B4
# Levels: B1 B2 B3 B4
答案 1 :(得分:3)
您可以使用cut
batch <- cut(as.numeric(gsub(".+S(\\d+)$","\\1",t)), #identify last numeric code
c(0,48,96,144,192), #breakpoints for cut
labels = c("B1","B2","B3","B4")) #names of batches