将R代码块转换为函数-data.table where和update操作

时间:2018-08-11 09:28:58

标签: r data.table

我有以下需要经常重复的代码块:

flights <- fread("https://raw.githubusercontent.com/wiki/arunsrinivasan/flights/NYCflights14/flights14.csv")

flights$origin %>% table() 
flights[grepl("jfk", origin, ignore.case = TRUE),
        origin := "0",
      ][grepl("ewr|lga", origin, ignore.case = TRUE),
        origin := "1",
      ][, origin := as.numeric(origin)] 
flights$origin %>% table()

这是我尝试将其包装在一个函数中的功能,该函数使我可以n个正则表达式表达式并替换数据集中任何给定列的表达式。

my_function <- function(regex, replacement, column) {   
    flights[, column, with = FALSE] %>% table()   
    for (i in seq_along(regex)) {
        responses[grepl(regex[i], column, ignore.case = TRUE), 
                  column := replacement[i],
                  with = FALSE]   
    }   
    flights[, column := as.numeric(column)]
    flights[, column, with = FALSE] %>% table() 
}

但这会发出以下警告消息:

Warning messages:
1: In `[.data.table`(flights, grepl(regex[i], column, ignore.case = TRUE),  :
  with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning.
2: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion

任何帮助将不胜感激。非常感谢。

1 个答案:

答案 0 :(得分:1)

弄清楚了,将把我的解决方案留在这里,对其他所有人都有利。

  1. 不要使用with = FALSE来使用()来按名称引用列。
  2. 要将列传递给另一个函数(在我的情况下为grepl()),请使用get()函数。

my_function <- function(regex, # Vector of regex strings to match
                    replacement, # Vector of strings to replace the matches from 'regex' arg
                    column, # Column to operate on 
                    as.numeric = FALSE # Optional arg; convert 'column' to numeric as final step?
) {  
  cat("Converting:..")
  responses[, column, with = FALSE] %>% table() %>% print
  for (i in seq_along(regex)) {
    responses[grepl(regex[i], get(column), ignore.case = TRUE, perl = TRUE), 
              (column) := replacement[i]]   
  }
  if (as.numeric) {
    responses[, (column) := as.numeric(get(column))]
  }
  cat("to:..")
  responses[, column, with = FALSE] %>% table() %>% print
}