我尝试使用apply系列(在这种情况下为sapply
)将以下函数应用于向量:
get_dates <- function(text, pattern, pattern_list){
text <- str_to_lower(text)
index <- switch (pattern,
pattern_1 = 1,
pattern_2 = 2,
pattern_3 = 3,
pattern_4 = 4,
pattern_5 = 5,
pattern_7 = 6,
pattern_8 = 7
)
regex_pattern = pattern_list[index]
dates <- str_extract(text, regex_pattern)
return(dates)
}
参数 text , pattern 和 pattern_list 描述如下:
我尝试的解决方案如下,只使用我的文本和模式向量的前两个元素。
text <- c("FEB-MAY14", "JUN-AUG14")
pattern <- c("pattern_8", "pattern_8")
pattern_list <- c(full_pattern_1,
full_pattern_2,
full_pattern_3,
full_pattern_4,
full_pattern_5,
full_pattern_7,
full_pattern_8)
其中模式列表的元素如下:
[1] "\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s? \\d{2}\\s?\\-\\s?\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{2}"
[2] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,?\\s?\\d{4}"
[3] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,\\s?\\d{4}"
[4] "\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}\\s?\\-\\s?\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}"
[5] "(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{1,2}\\,\\s?\\d{4}"
[6] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{1,2}\\,\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,\\s?\\d{4}"
[7] "(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\-?(\\d{2,4})?\\-?\\s?(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)?\\s*\\-*\\d{2,4}"
daaa <- sapply(text,
function(x, y, z) get_dates(x, y, z),
y = pattern,
z = pattern_list)
但是,当我使用apply
时,我收到以下错误:
Error in switch(pattern, pattern_1 = 1, pattern_2 = 2, pattern_3 = 3, :
EXPR must be a length 1 vector
对我来说没有意义;我认为apply系列会在应用函数时循环获取每个元素。我已经使用for-loop
手动浏览向量,并且按预期工作:
daaa <- c()
for(i in 1:2){
daaa[i] <- get_dates(text[i],
pattern[i],
pattern_list)
}
我忽略了实际正则表达式模式和文本内容的本质,因为这不是问题,至少我现在看到的是什么。如果需要,我可以想出一个人为的输入示例,但是现在我遇到的问题是应用我的函数而switch
似乎是瓶颈。
答案 0 :(得分:3)
您的sapply
和for
循环结构不匹配,因此结果不相同是有道理的。在for
循环中,第i个结果获得pattern[i]
作为第二个arg。在sapply
版本中,您传递了整个向量pattern
sapply
一次一个地处理一个函数的事物的每个元素(这里是text
),但是这不适用于函数中的参数申请(如pattern
)
如果要迭代多个对象,请尝试将apply
函数转换为迭代向量:
sapply(1:12, function(i) get_dates(text[i], pattern[i], pattern_list))
switch
的使用无关,使用switch
sapply
时没有问题,例如:
my_fun <- function(x) switch(x, a='alpha', b='beta')
sapply(c('a', 'b', 'b'), my_fun)
# a b b
# "alpha" "beta" "beta"