Question

我有文件列表，需要从文件名中提取年份：

文件名是：

[1] "2014_by_country_and_type_Enlarged_Europe.xlsx"            
[2] "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls"       
[3] "20150219_2013_vo_By_Country_Enlarged_Europe.xlsx"

查询：

 regmatches(files, regexpr("[0-9].*[0-9]", files))

但结果是：

 [1] "2014"            
 [2] "20140211_02_2012" 
 [3]"20150219_2013"

我需要输出为：

 2014
 2012
 2013

Answer 1

你可以试试这个：

regmatches(x, regexpr("(\\d{4})(?=_([a-zA-Z]+))",x, perl=T))

假设：选择的数字作为年份，后跟下划线然后是字母。

正向前瞻此处适用于digits_of_year(?= underscore_with_alphabets）匹配digits_of_year，后跟一个underscore_with_alphabets，而不会使匹配的underscore_with_alphabets成为其中的一部分。

<强>输出：

[1] "2014" "2012" "2013"

数据：

x <- c("2014_by_country_and_type_Enlarged_Europe.xlsx", "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls", "20150219_2013_vo_By_Country_Enlarged_Europe.xlsx")

Answer 2

使用gsub（）的简单正则表达式：

gsub(".*(\\d{4})_.+", "\\1", str)
[1] "2014" "2012" "2013"

它匹配后跟_的任何4位数字。

如何从R中的文件名中提取年份

2 个答案: