使用Switch将函数应用于向量

时间:2016-03-02 15:57:45

标签: r

我尝试使用apply系列(在这种情况下为sapply)将以下函数应用于向量:

get_dates <- function(text, pattern, pattern_list){

  text <- str_to_lower(text)

  index <- switch (pattern,
    pattern_1 = 1,
    pattern_2 = 2,
    pattern_3 = 3,
    pattern_4 = 4,
    pattern_5 = 5,
    pattern_7 = 6,
    pattern_8 = 7
  )

  regex_pattern = pattern_list[index]

  dates <- str_extract(text, regex_pattern)

  return(dates)
}

参数 text pattern pattern_list 描述如下:

  • text&lt; - 从数据框中取出的字符向量
  • pattern&lt; - 与文本长度相同的字符向量,取自与文本相同的数据框
  • pattern_list&lt; - 根据模式的值应用于文本的七种不同正则表达式模式的向量

我尝试的解决方案如下,只使用我的文本和模式向量的前两个元素。

text <- c("FEB-MAY14", "JUN-AUG14")
pattern <- c("pattern_8", "pattern_8")

 pattern_list <- c(full_pattern_1,
             full_pattern_2,
             full_pattern_3,
             full_pattern_4,
             full_pattern_5,
             full_pattern_7,
             full_pattern_8)

其中模式列表的元素如下:

 [1] "\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?  \\d{2}\\s?\\-\\s?\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{2}"                                                                                                                                                                     
 [2] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,?\\s?\\d{4}"                                                                                               
 [3] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,\\s?\\d{4}"                                                                                                          
 [4] "\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}\\s?\\-\\s?\\d{2}\\s?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}"                                                                                                                                                                     
 [5] "(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{1,2}\\,\\s?\\d{4}"                                                                                                                                    
 [6] "(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{1,2}\\,\\s?\\d{4}\\s?\\-\\s?(january|february|march|april|may|june|july|august|september|october|november|december)\\s?\\d{2}\\,\\s?\\d{4}"                                                                                 
 [7] "(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\\s?\\-?(\\d{2,4})?\\-?\\s?(january|february|march|april|may|june|july|august|september|october|november|december|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)?\\s*\\-*\\d{2,4}"

daaa <- sapply(text, 
                function(x, y, z) get_dates(x, y, z), 
                y = pattern, 
                z = pattern_list)

但是,当我使用apply时,我收到以下错误:

Error in switch(pattern, pattern_1 = 1, pattern_2 = 2, pattern_3 = 3,  : 
EXPR must be a length 1 vector 

对我来说没有意义;我认为apply系列会在应用函数时循环获取每个元素。我已经使用for-loop手动浏览向量,并且按预期工作:

daaa <- c()
for(i in 1:2){
daaa[i] <- get_dates(text[i],
                   pattern[i],
                   pattern_list)
}

我忽略了实际正则表达式模式和文本内容的本质,因为这不是问题,至少我现在看到的是什么。如果需要,我可以想出一个人为的输入示例,但是现在我遇到的问题是应用我的函数而switch似乎是瓶颈。

1 个答案:

答案 0 :(得分:3)

您的sapplyfor循环结构不匹配,因此结果不相同是有道理的。在for循环中,第i个结果获得pattern[i]作为第二个arg。在sapply版本中,您传递了整个向量pattern

sapply一次一个地处理一个函数的事物的每个元素(这里是text),但是这不适用于函数中的参数申请(如pattern

如果要迭代多个对象,请尝试将apply函数转换为迭代向量:

sapply(1:12, function(i) get_dates(text[i], pattern[i], pattern_list))

switch的使用无关,使用switch sapply时没有问题,例如:

my_fun <- function(x) switch(x, a='alpha', b='beta')
sapply(c('a', 'b', 'b'), my_fun)

#      a       b       b 
# "alpha"  "beta"  "beta"