Question

我想从列表中提取部分字符串。我不知道如何定义字符串的模式。谢谢你的帮助。

library(stringr)
names = c("GAPIT..flowerdate.GWAS.Results.csv","GAPIT..flwrcolor.GWAS.Results.csv",
"GAPIT..height.GWAS.Results.csv","GAPIT..matdate.GWAS.Results.csv")
# I want to extract out "flowerdate", "flwrcolor", "height" and "matdate"
traits <- str_extract_all(string = files, pattern = "..*.")
# the result is not what I want.

Answer 1

您也可以使用regmatches

> regmatches(c, regexpr("[[:lower:]]+", c))
[1] "flowerdate" "flwrcolor"  "height"     "matdate"

我建议您不要将c用作变量名，因为您要覆盖c函数。

Answer 2

我借用了RomanLuštrik的答案来回答我之前的问题“如何在数据框中提取部分名称作为新列名”

traits <- unlist(lapply(strsplit(names, "\\."), "[[", 3))

Answer 3

使用sub：

sub(".*\\.{2}(.+?)\\..*", "\\1", names)
# [1] "flowerdate" "flwrcolor"  "height"     "matdate"

Answer 4

以下是一些解决方案。前两个根本不使用正则表达式。 lsat使用单个gsub：

1）read.table 。这假定所需的字符串始终是第3个字段：

read.table(text = names, sep = ".", as.is = TRUE)[[3]]

2）strsplit 这假设所需的字符串超过3个字符且小写：

sapply(strsplit(names, "[.]"), Filter, f = function(x) nchar(x) > 3 & tolower(x) == x)

3）gsub 这假设在字符串前面有两个点，之后是一个点加上不包含两个连续点的垃圾：

gsub(".*[.]{2}|[.].*", "", names)

已修订已添加其他解决方案。

根据r中的模式提取部分字符串

4 个答案: